Question 2: Not that I know of Question 2.1. It's actually pretty difficult to understand why a single _term_ can be over 32K and still make sense. This is not to say that a single _text_ field can't be over 32K, each term within that field is (usually) much less than that.
Do you have a real-world use-case where you have a 115K term that can _only_ be matched by searching for exactly that sequence of 115K characters? Not substrings. Not wildcards. A "string" type (as opposed to anything based on solr.Textfield). As far as the error message is concerned, that does seem somewhat opaque. Care to raise a JIRA on it (and, if you're really ambitious attach a patch)? Best, Erick On Thu, Aug 4, 2016 at 8:20 PM, Trejkaz <trej...@trypticon.org> wrote: > Trying to add a document, someone saw: > > java.lang.IllegalArgumentException: Document contains at least one > immense term in field="bcc-address" (whose UTF8 encoding is longer > than the max length 32766), all of which were skipped. Please correct > the analyzer to not produce such terms. The prefix of the first > immense term is: '[00, --omitted--]...', original message: bytes can > be at most 32766 in length; got 115597 > > Question 1: It says the bytes are being skipped, but to me "skipped" > means it's just going to continue, yet I get this exception. Is that > intentional? > > Question 2: Can we turn this check off? > > Question 2.1: Why limit in the first place? Every time I have ever > seen someone introduce a limit, it has only been a matter of time > until someone hits it, no matter how improbable it seemed when it was > put in. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org