Re: Removing duplicate terms from query

Ere Maijala Thu, 09 Feb 2017 05:18:55 -0800

Thanks Emir.

I was thinking of something very simple like doing whatRemoveDuplicatesTokenFilter does but ignoring positions. It would ofcourse still be possible to have the same term multiple times, but atleast the adjacent ones could be deduplicated. The reason I'm not tooeager to do it in a query preprocessor is that I'd have to essentiallyduplicate functionality of the query analysis chain that containsICUTokenizerFactory, WordDelimiterFilterFactory and whatnot.


Regards,
Ere

9.2.2017, 14.52, Emir Arnautovic kirjoitti:

Hi Ere,

I don't think that there is such filter. Implementing such filter would
require looking backward which violates streaming approach of token
filters and unpredictable memory usage.

I would do it as part of query preprocessor and not necessarily as part
of Solr.

HTH,
Emir


On 09.02.2017 12:24, Ere Maijala wrote:

Hi,

I just noticed that while we use RemoveDuplicatesTokenFilter during
query time, it will consider term positions and not really do anything
e.g. if query is 'term term term'. As far as I can see the term
positions make no difference in a simple non-phrase search. Is there a
built-in way to deal with this? I know I can write a filter to do
this, but I feel like this would be something quite basic to do for
the query. And I don't think it's even anything too weird for normal
users to do. Just consider e.g. searching for music by title:

Hey, hey, hey ; Shivers of pleasure

I also verified that at least according to debugQuery=true and
anecdotal evicende the search really slows down if you repeat the same
term enough.

--Ere


--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Removing duplicate terms from query

Reply via email to