rmuir commented on pull request #477: URL: https://github.com/apache/lucene/pull/477#issuecomment-982527039
> It's about fields that produce no tokens so it's more than empty strings, it can also be fields that only contain punctuation and stop words (e.g. "to be or not to be" with EnglishAnalyzer). It's probably still a bit of an edge case but we changed the semantics of `exists` queries to only match fields that have tokens years ago and got a couple bug reports, e.g. [elastic/elasticsearch#7348](https://github.com/elastic/elasticsearch/issues/7348). > > It's a pity that it doesn't allow us to better optimize this case but I can understand why these semantics can make sense if users want to find all documents for which they provided one or more values at index time. @jpountz I strongly disagree with this stuff, and I think its absolutely terrible that it crept its way into lucene (especially the norms 0 stuff). Let's clean this shit up. If you want tokens, index your data correctly. Not just talking about empty strings but stopwords and everything else. You have the problem where users are using incorrect analysis chain, instead of fixing that, we changed semantics of norms and gave queries like this crazy semantics? awful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
