[jira] [Commented] (LUCENENET-607) InvalidCastException PendingTerm cannot be cast to PendingBlock

Aaron Meyers (JIRA) Mon, 04 Mar 2019 06:48:13 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENENET-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783433#comment-16783433
 ]


Aaron Meyers commented on LUCENENET-607:
----------------------------------------

Big +1 to this. We've seen this in production with Lucene.Net in Microsoft 
Power BI but hadn't figured out what was happening yet (it doesn't happen very 
often and we hadn't reproduced in a local environment).

This fix should definitely go into the 4.8.0 official release.

> InvalidCastException PendingTerm cannot be cast to PendingBlock
> ---------------------------------------------------------------
>
>                 Key: LUCENENET-607
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-607
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Khindikaynen Aleksey
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Here is exception call stack:
> {code:java}
> at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64 
> sumTotalTermFreq, Int64 sumDocFreq, Int32 docCount, TermsHashPerField 
> termsHashPerField)
> at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName, 
> FieldsConsumer consumer, SegmentWriteState state)
> at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush, 
> SegmentWriteState state)
> at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush, 
> SegmentWriteState state)
> at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush, 
> SegmentWriteState state)
> at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
> at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
> at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread 
> flushingDWPT)
> at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
> at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
> at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit 
> commit)
> at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher 
> referenceToRefresh)
> at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
> at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
> at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
> {code}
> Issue is quite "hard-to-reproduce" and appears only when adding documents 
> with the same terms concurrently. I have not managed to make a clear test 
> that reproduces the issue.
> I've made some research and found out that the cause of the issue are 
> duplicate terms in BytesRefHash structure. BytesRefHash using the 
> Murmurhash3_x86_32 hashing algorithm with the random seed (see 
> StringHelper.GOOD_FAST_HASH_SEED property). StringHelper.GOOD_FAST_HASH_SEED 
> property is not thread-safe and could return different values if called in 
> severeal threads in one moment, so it could result in duplicate values in 
> BytesRefHash (same values return different hashes because hashes were 
> calcucated with different seeds).
> There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is 
> used to randomize the seed, but DateTime.Now.Millisecond could return 0 and 
> this value is treated an "uninitialized" and the second GOOD_FAST_HASH_SEED 
> call will return another value.
> The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED 
> initialization to the static ctor of StringHelper. It will make it 
> thread-safe and will fix 0-value issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (LUCENENET-607) InvalidCastException PendingTerm cannot be cast to PendingBlock

Reply via email to