[
https://issues.apache.org/jira/browse/METRON-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480976#comment-16480976
]
ASF GitHub Bot commented on METRON-1567:
----------------------------------------
Github user justinleet closed the pull request at:
https://github.com/apache/metron/pull/1020
> Large error message can't be written in Solr
> --------------------------------------------
>
> Key: METRON-1567
> URL: https://issues.apache.org/jira/browse/METRON-1567
> Project: Metron
> Issue Type: Sub-task
> Reporter: Justin Leet
> Assignee: Justin Leet
> Priority: Major
>
> Error message on the feature branch:
> {code:java}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at
> http://ip-11-0-1-51.us-west-2.compute.internal:8983/solr/error: Exception
> writing document id cd6db5c1-f41b-4dcf-8f68-583c7fc08575 to the index;
> possible analysis error: Document contains at least one immense term in
> field="raw_message_1" (whose UTF8 encoding is longer than the max length
> 32766), all of which were skipped. Please correct the analyzer to not produce
> such terms. The prefix of the first immense term is: '[123, 34, 101, 120, 99,
> 101, 112, 116, 105, 111, 110, 34, 58, 34, 106, 97, 118, 97, 46, 105, 111, 46,
> 70, 105, 108, 101, 78, 111, 116, 70]...', original message: bytes can be at
> most 32766 in length; got 165866. Perhaps the document has an indexed string
> field (solr.StrField) which is too large
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
> ~[stormjar.jar:?]
> ...{code}
> This is a hard limit of string fields, per
> https://lucene.apache.org/solr/guide/6_6/field-types-included-with-solr.html
> It also mentions they aren't tokenized or analyzed, so it doesn't seem like
> we'd be able to turn this limit off.
> Text fields don't list any sort of limit (although they may still have one),
> so we may want to switch to that, but it would require testing.
> Additionally, it appears that raw_message is dynamic (since it's getting _1,
> but we don't define it in the schema).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)