Hello, Check the whitespace tokenizer. On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar <amiteshk...@gmail.com> wrote:
> Hi, > > I am facing a requirement change to get % sign retained in searches. e.g. > > Sample search docs: > 1. Number of boys 50 > 2. My score was 50% > 3. 40-50% for pass score > > Search query: 50% > Expected results: Doc-2, Doc-3 i.e. > My score was > 1. 50% > 2. 40-50% for pass score > > Actual result: All 3 documents (because tokenizer strips off the % both > during indexing as well as searching and hence matches all docs with 50 in > it. > > On the implementation front, I am using a set of filters like > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer > StandardTokenizer. > > Per my analysis suggests, StandardTokenizer strips off the % I am facing a > requirement change to get % sign retained in searches. e.g > > Sample search docs: > 1. Number of boys 50 > 2. My score was 50% > 3. 40-50% for pass score > > Search query: 50% > Expected results: Doc-2, Doc-3 i.e. > My score was 50% > 40-50% for pass score > > Actual result: All 4 documents > > On the implementation front, I am using a set of filters like > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer > StandardTokenizer. > > Per my analysis, StandardTOkenizer strips off the % sign and hence the > behavior.Has someone faced similar requirement? Any help/guidance is highly > appreciated. > -- Sincerely yours Mikhail Khludnev