On Fri, Aug 5, 2016 at 2:51 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Question 2: Not that I know of > > Question 2.1. It's actually pretty difficult to understand why a single _term_ > can be over 32K and still make sense. This is not to say that a > single _text_ field can't be over 32K, each term within that field > is (usually) much less than that. > > Do you have a real-world use-case where you have a 115K term > that can _only_ be matched by searching for exactly that > sequence of 115K characters? Not substrings. Not wildcards. A > "string" type (as opposed to anything based on solr.Textfield).
This particular field is used to store unique addresses, and for precision reasons we wanted to search for addresses without tokenising them, as if you tokenised them, b...@example.com could accidentally match b...@example.com.au, even though they're two different people. It also makes statistics faster to calculate. Now, addresses in SMTP email are fairly short, limited to something like 254 characters, but sometimes you get data that violates the standard, and we store more than just that one kind of address, and maybe one of the other sorts can be longer. In this situation, it isn't clear whether you can truncate the data, because if you truncate it, now two addresses are considered equal when they're not the same string. But then again, if the old version of Lucene was already truncating it, people might be fine with it being truncated in the new version. But if they didn't know that, there would definitely be someone who objects. So I'm not really saying that the term "makes sense" - I'm just saying we encountered it in real-world data, and an error occurred. Someone then complained about the error. > As far as the error message is concerned, that does seem somewhat opaque. > Care to raise a JIRA on it (and, if you're really ambitious attach a patch)? I'll see. :) TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org