[ 
https://issues.apache.org/jira/browse/NIFI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12850:
--------------------------------
    Fix Version/s: 2.0.0
                   1.26.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Failure to index Provenance Events with large filename attribute
> ----------------------------------------------------------------
>
>                 Key: NIFI-12850
>                 URL: https://issues.apache.org/jira/browse/NIFI-12850
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.25.0, 2.0.0-M2
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>            Priority: Major
>             Fix For: 2.0.0, 1.26.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index 
> Provenance Events java.lang.IllegalArgumentException: Document contains at 
> least one immense term in field="filename" (whose UTF8 encoding is longer 
> than the max length 32766), all of which were skipped. Please correct the 
> analyzer to not produce such terms. The prefix of the first immense term is: 
> '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 
> 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes 
> can be at most 32766 in length; got 74483 at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)
>  at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
>  at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
>  at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
>  at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at 
> org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70)
>  at 
> org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202)
>  at 
> org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750) Caused by: 
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes 
> can be at most 32766 in length; got 74483 at 
> org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at 
> org.apache.lucene.index.DefaultIndexingChain$PerField. {code}
> Looking at the code, it looks like filename is the only attribute that could 
> be set with arbitrary values that is not protected against overly large 
> values right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to