rmuir commented on pull request #477:
URL: https://github.com/apache/lucene/pull/477#issuecomment-982527039


   > It's about fields that produce no tokens so it's more than empty strings, 
it can also be fields that only contain punctuation and stop words (e.g. "to be 
or not to be" with EnglishAnalyzer). It's probably still a bit of an edge case 
but we changed the semantics of `exists` queries to only match fields that have 
tokens years ago and got a couple bug reports, e.g. 
[elastic/elasticsearch#7348](https://github.com/elastic/elasticsearch/issues/7348).
   > 
   > It's a pity that it doesn't allow us to better optimize this case but I 
can understand why these semantics can make sense if users want to find all 
documents for which they provided one or more values at index time.
   
   @jpountz I strongly disagree with this stuff, and I think its absolutely 
terrible that it crept its way into lucene (especially the norms 0 stuff). 
Let's clean this shit up.
   
   If you want tokens, index your data correctly. Not just talking about empty 
strings but stopwords and everything else. You have the problem where users are 
using incorrect analysis chain, instead of fixing that, we changed semantics of 
norms and gave queries like this crazy semantics? awful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to