[ https://issues.apache.org/jira/browse/NIFI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Burgess updated NIFI-12850: -------------------------------- Fix Version/s: 2.0.0 1.26.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Failure to index Provenance Events with large filename attribute > ---------------------------------------------------------------- > > Key: NIFI-12850 > URL: https://issues.apache.org/jira/browse/NIFI-12850 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.25.0, 2.0.0-M2 > Reporter: Pierre Villard > Assignee: Pierre Villard > Priority: Major > Fix For: 2.0.0, 1.26.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index > Provenance Events java.lang.IllegalArgumentException: Document contains at > least one immense term in field="filename" (whose UTF8 encoding is longer > than the max length 32766), all of which were skipped. Please correct the > analyzer to not produce such terms. The prefix of the first immense term is: > '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, > 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes > can be at most 32766 in length; got 74483 at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984) > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208) > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415) > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at > org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70) > at > org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202) > at > org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) Caused by: > org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes > can be at most 32766 in length; got 74483 at > org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at > org.apache.lucene.index.DefaultIndexingChain$PerField. {code} > Looking at the code, it looks like filename is the only attribute that could > be set with arbitrary values that is not protected against overly large > values right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)