[
https://issues.apache.org/jira/browse/LUCENENET-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783433#comment-16783433
]
Aaron Meyers commented on LUCENENET-607:
----------------------------------------
Big +1 to this. We've seen this in production with Lucene.Net in Microsoft
Power BI but hadn't figured out what was happening yet (it doesn't happen very
often and we hadn't reproduced in a local environment).
This fix should definitely go into the 4.8.0 official release.
> InvalidCastException PendingTerm cannot be cast to PendingBlock
> ---------------------------------------------------------------
>
> Key: LUCENENET-607
> URL: https://issues.apache.org/jira/browse/LUCENENET-607
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Khindikaynen Aleksey
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Here is exception call stack:
> {code:java}
> at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64
> sumTotalTermFreq, Int64 sumDocFreq, Int32 docCount, TermsHashPerField
> termsHashPerField)
> at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName,
> FieldsConsumer consumer, SegmentWriteState state)
> at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
> at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
> at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread
> flushingDWPT)
> at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
> at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
> at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit
> commit)
> at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher
> referenceToRefresh)
> at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
> at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
> at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
> {code}
> Issue is quite "hard-to-reproduce" and appears only when adding documents
> with the same terms concurrently. I have not managed to make a clear test
> that reproduces the issue.
> I've made some research and found out that the cause of the issue are
> duplicate terms in BytesRefHash structure. BytesRefHash using the
> Murmurhash3_x86_32 hashing algorithm with the random seed (see
> StringHelper.GOOD_FAST_HASH_SEED property). StringHelper.GOOD_FAST_HASH_SEED
> property is not thread-safe and could return different values if called in
> severeal threads in one moment, so it could result in duplicate values in
> BytesRefHash (same values return different hashes because hashes were
> calcucated with different seeds).
> There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is
> used to randomize the seed, but DateTime.Now.Millisecond could return 0 and
> this value is treated an "uninitialized" and the second GOOD_FAST_HASH_SEED
> call will return another value.
> The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED
> initialization to the static ctor of StringHelper. It will make it
> thread-safe and will fix 0-value issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)