Dear Developers, I am writing to request your assistance in verifying some proposed changes to StandardTokenizer for my use case. Specifically, we would like to know if the changes we plan to make will function as intended and not cause any unintended consequences. into When using Java Lucene 9.5, a text field containing "text&search" is tokenized into: 1. text 2. search using '&' as a delimiter.
Similarly when using CLucene 2.3.3.4, the same field is tokenized into: 1. text&search As our use case requires the field to be split into 2 terms, some modifications were made to StandardTokenizer.cpp, In StandardTokenizer::ReadAlphaNum(const TCHAR prev, Token* t), case '&' was commented out. (Line number 278-280) Post the changes the above mentioned string gets tokenized to 2 terms. (text, search) I want to know if the change made is appropriate or not. Please take some time to review the changes and let us know your thoughts. If you have any concerns, suggestions, or questions, please do not hesitate to reach out to me. Thank you in advance for your help and expertise. We look forward to hearing from you. Best regards, Achyuth Pramod
_______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers