[
https://issues.apache.org/jira/browse/NIFI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822281#comment-17822281
]
ASF subversion and git services commented on NIFI-12850:
--------------------------------------------------------
Commit 6863b4ea7161684956bf3b8287a473c4f9c1f185 in nifi's branch
refs/heads/support/nifi-1.x from Pierre Villard
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=6863b4ea71 ]
NIFI-12850 - Prevent indexing of overly large filename attribute
Signed-off-by: Matt Burgess <[email protected]>
> Failure to index Provenance Events with large filename attribute
> ----------------------------------------------------------------
>
> Key: NIFI-12850
> URL: https://issues.apache.org/jira/browse/NIFI-12850
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.25.0, 2.0.0-M2
> Reporter: Pierre Villard
> Assignee: Pierre Villard
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> {code:java}
> ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index
> Provenance Events java.lang.IllegalArgumentException: Document contains at
> least one immense term in field="filename" (whose UTF8 encoding is longer
> than the max length 32766), all of which were skipped. Please correct the
> analyzer to not produce such terms. The prefix of the first immense term is:
> '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48,
> 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes
> can be at most 32766 in length; got 74483 at
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)
> at
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
> at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
> at
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at
> org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70)
> at
> org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202)
> at
> org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) Caused by:
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
> can be at most 32766 in length; got 74483 at
> org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at
> org.apache.lucene.index.DefaultIndexingChain$PerField. {code}
> Looking at the code, it looks like filename is the only attribute that could
> be set with arbitrary values that is not protected against overly large
> values right now.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)