I understand it would change the behavior of existing search solutions, however the current behavior is just wrong. An ACRONYM cannot be ABC.DEF. If you look up acronym in Wikipedia, you find only examples of I.B.M. / U.S.A. like, or NATO, IBM, USA, but nothing of the form StandardAnalyzer currently recognizes.
There are several ways to solve this change: 1. Create a new analyzer that fixes the problem - that way, applications that don't want to use it will not have to, if they feel ok with the current behavior. However, for those who would like to get a correct behavior, they'll be able to. This is not my favorite solution, but I think it would be preferable than simply not fixing it. 2. Fix it in the new version (2.3) and specifically mention that in the release notes. Aren't there releases where applications need to re-build the index because of fundamental changes? Am I the only one who thinks that? BTW, I changed the definition in the jflex file and recompiled using jflex and it indeed solved the problem. It now recognizes www.abc.com. and www.abc.com as hosts. I can attach the 'patch' files if you'd like to compare. On Nov 27, 2007 9:07 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : If you pass "www.abc.com", the output is (www.abc.com,0,11,type=<HOST>) > : (which is correct in my opinion). > : However, if you pass "www.abc.com." (notice the extra '.' at the end), > the > : output is (wwwabccom,0,12,type=<ACRONYM>). > > see also... > > http://www.nabble.com/Inconsistent-StandardTokenizer-behaviour-tf596059.html#a1593383 > > http://www.nabble.com/Standard-Analyzer---Host-and-Acronym-tf3620533.html#a10109926 > > one hitch which potentially changing this now is that it would break > some searches in applications that have existing indexes built using > previous versions. > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera