Re: encoding is longer than the max length 32766

Karel Minařík Tue, 03 Jun 2014 09:19:25 -0700

This is actually a change in Lucene -- previously, the long term was 
silently dropped, now it raises an exception, see Lucene 
ticket https://issues.apache.org/jira/browse/LUCENE-5710


You might want to add a `length` filter to your analyzer 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html#analysis-length-tokenfilter).

All in all, it hints at some strange data, because such "immense" term 
shouldn't probably be in the index in the first place.

Karel

On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:
>
> We’re running into a peculiar issue when updating indexes with content for 
> the document.
>
>
> "document contains at least one immense term in (whose utf8 encoding is 
> longer than the max length 32766), all of which were skipped. please 
> correct the analyzer to not produce such terms”
>
>
> I’m hoping that there’s a simple fix or setting that can resolve this.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a91895cb-437a-4642-8734-4445bb420125%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: encoding is longer than the max length 32766

Reply via email to