Re: DelimitedTermFrequencyTokenFilter

Edward Ribeiro Fri, 29 Nov 2019 03:08:05 -0800

Oh, silly of me. :)

Thanks,
Edward


Em sex, 29 de nov de 2019 07:13, Alan Woodward <[email protected]>
escreveu:

> I think it’s working fine - Luke is showing you the docFreq of the term,
> which will be 1 as it only appears in a single document.
>
> On 28 Nov 2019, at 21:51, Edward Ribeiro <[email protected]> wrote:
>
> Hi,
>
> Please, anyone has an example of DelimitedTermFrequencyTokenFilter use
> that could share?
>
> I have been banging my head against the wall trying to make it work (
> https://gist.github.com/eribeiro/ebb24feb3fd84931b7c288b9b716ed49 ) and
> idk what I am doing wrong.
>
> I am creating a custom analyzer that uses a WhitespaceTokenizer to parse a
> string like "a|10 b|2 c|9", and pass it to
> DelimitedTermFrequencyTokenFilter. I am inserting a custom field that is
> added to the document to prevent it from having positions and offsets.
>
> The debugger shows the string is being correctly parsed by DTFTF and its
> char and term attributes are properly set up. But the term frequency of
> each term is 1 when I inspect the index via Luke. Curiously, the output of
> my snippet shows the correct total term frequency as seen below:
>
> field="text",maxDoc=1,docCount=1,sumTotalTermFreq=123,sumDocFreq=3
> a|10 b|23 c|90
> SumTotalTermFreq: 123
> SumDocFreq: 3
>
> Cheers,
> Edward
> PS: I am a Lucene newbie so it may be something quite stupid.
>
>
>

Re: DelimitedTermFrequencyTokenFilter

Reply via email to