Pierre Villard created NIFI-12850:
-------------------------------------

             Summary: Failure to index Provenance Events with large attributes
                 Key: NIFI-12850
                 URL: https://issues.apache.org/jira/browse/NIFI-12850
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Pierre Villard
            Assignee: Pierre Villard


{code:java}
ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index 
Provenance Events java.lang.IllegalArgumentException: Document contains at 
least one immense term in field="filename" (whose UTF8 encoding is longer than 
the max length 32766), all of which were skipped. Please correct the analyzer 
to not produce such terms. The prefix of the first immense term is: '[49, 50, 
55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 46, 97, 
118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes can be at 
most 32766 in length; got 74483 at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)
 at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
 at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
 at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
 at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
 at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) 
at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at 
org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70)
 at 
org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202)
 at 
org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:750) Caused by: 
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can 
be at most 32766 in length; got 74483 at 
org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at 
org.apache.lucene.index.DefaultIndexingChain$PerField. {code}
Looking at the code, it looks like filename is the only attribute that could be 
set with arbitrary values that is not protected against overly large values 
right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to