Re: How to retain % sign next to number during tokenization

Mikhail Khludnev Wed, 20 Sep 2023 13:37:50 -0700

Hello,
Check the whitespace tokenizer.

On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar <[email protected]> wrote:


> Hi,
>
> I am facing a requirement change to get % sign retained in searches. e.g.
>
> Sample search docs:
> 1. Number of boys 50
> 2. My score was 50%
> 3. 40-50% for pass score
>
> Search query: 50%
> Expected results: Doc-2, Doc-3 i.e.
> My score was
> 1. 50%
> 2. 40-50% for pass score
>
> Actual result: All 3 documents (because tokenizer strips off the % both
> during indexing as well as searching and hence matches all docs with 50 in
> it.
>
> On the implementation front, I am using a set of filters like
> lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
> StandardTokenizer.
>
> Per my analysis suggests, StandardTokenizer strips off the %  I am facing a
> requirement change to get % sign retained in searches. e.g
>
> Sample search docs:
> 1. Number of boys 50
> 2. My score was 50%
> 3. 40-50% for pass score
>
> Search query: 50%
> Expected results: Doc-2, Doc-3 i.e.
> My score was 50%
> 40-50% for pass score
>
> Actual result: All 4 documents
>
> On the implementation front, I am using a set of filters like
> lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
> StandardTokenizer.
>
> Per my analysis, StandardTOkenizer strips off the %  sign and hence the
> behavior.Has someone faced similar requirement? Any help/guidance is highly
> appreciated.
>


-- 
Sincerely yours
Mikhail Khludnev

Re: How to retain % sign next to number during tokenization

Reply via email to