I'm not sure I can share a sample, but the specific situation I'm thinking of is when you have data that doesn't exist within a sentence, for example the name, address, etc of a company. Some foreign companies have funky punctuation within their names and addresses.
I'd have to see the results to know if the NLP would mess anything up. If all it did was weight the results, then perhaps it wouldn't, but it's also possible that it would. Basically my concern is that it would mess up the use of lucene for non-sentence based applications that might contain punctuation. On the whole I think adding the NLP to lucene is a good idea because the vast majority of the applications of lucene would benefit from it. Making it optional could be a good way to maintain the current power of lucene and perhaps also retain the speed depending on the performance of the NLP functionality and the needs of the user. -----Original Message----- From: Chong, Herb [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2003 10:01 AM To: Lucene Users List Subject: RE: Contributing to Lucene (was RE: inter-term correlation [was R e: Vector Space Model in Lucene?]) show an example document. Herb.... -----Original Message----- From: Dan Quaroni [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2003 9:48 AM To: 'Lucene Users List' Subject: RE: Contributing to Lucene (was RE: inter-term correlation [was R e: Vector Space Model in Lucene?]) My only concern with this being integrated into lucene is that it be done in a way that doesn't make its use mandatory. Lucene is powerful enough that it can be used for a lot of cases where NLP doesn't make any sense. For example, I think that sentence boundaries would severely screw up the project I recently did using lucene because there are no sentences, but there is punctuation. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
