[
https://issues.apache.org/jira/browse/LUCENENET-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shad Storhaug resolved LUCENENET-607.
-------------------------------------
Resolution: Fixed
Thanks for the PR.
{quote}There is another issue with GOOD_FAST_HASH_SEED.
DateTime.Now.Millisecond is used to randomize the seed, but
DateTime.Now.Millisecond could return 0 and this value is treated an
"uninitialized" and the second GOOD_FAST_HASH_SEED call will return another
value.{quote}
This was due to a second bug that was made during translation of the code from
Java.
[{{System.currentTimeMillis()}}|https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#currentTimeMillis--]
returns the number of milliseconds since January 1, 1970, not the number of
milliseconds of the current time. I have replaced {{DateTime.Now.Millisecond}}
with {{Time.CurrentTimeMilliseconds()}}, which relies on
{{System.Diagnostics.Timestamp}} to generate the value, making it a number much
higher than 999 that rarely repeats.
> InvalidCastException PendingTerm cannot be cast to PendingBlock
> ---------------------------------------------------------------
>
> Key: LUCENENET-607
> URL: https://issues.apache.org/jira/browse/LUCENENET-607
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Khindikaynen Aleksey
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Here is exception call stack:
> {code:java}
> at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64
> sumTotalTermFreq, Int64 sumDocFreq, Int32 docCount, TermsHashPerField
> termsHashPerField)
> at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName,
> FieldsConsumer consumer, SegmentWriteState state)
> at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush,
> SegmentWriteState state)
> at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
> at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
> at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread
> flushingDWPT)
> at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
> at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
> at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit
> commit)
> at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher
> referenceToRefresh)
> at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
> at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
> at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
> {code}
> Issue is quite "hard-to-reproduce" and appears only when adding documents
> with the same terms concurrently. I have not managed to make a clear test
> that reproduces the issue.
> I've made some research and found out that the cause of the issue are
> duplicate terms in BytesRefHash structure. BytesRefHash using the
> Murmurhash3_x86_32 hashing algorithm with the random seed (see
> StringHelper.GOOD_FAST_HASH_SEED property). StringHelper.GOOD_FAST_HASH_SEED
> property is not thread-safe and could return different values if called in
> severeal threads in one moment, so it could result in duplicate values in
> BytesRefHash (same values return different hashes because hashes were
> calcucated with different seeds).
> There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is
> used to randomize the seed, but DateTime.Now.Millisecond could return 0 and
> this value is treated an "uninitialized" and the second GOOD_FAST_HASH_SEED
> call will return another value.
> The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED
> initialization to the static ctor of StringHelper. It will make it
> thread-safe and will fix 0-value issue.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)