Khindikaynen Aleksey created LUCENENET-607:
----------------------------------------------

             Summary: InvalidCastException PendingTerm cannot be cast to 
PendingBlock
                 Key: LUCENENET-607
                 URL: https://issues.apache.org/jira/browse/LUCENENET-607
             Project: Lucene.Net
          Issue Type: Bug
          Components: Lucene.Net Core
    Affects Versions: Lucene.Net 4.8.0
            Reporter: Khindikaynen Aleksey


Here is exception call stack:
{code:java}
at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64 
sumTotalTermFreq, Int64 sumDocFreq, Int32 docCount, TermsHashPerField 
termsHashPerField)
at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName, 
FieldsConsumer consumer, SegmentWriteState state)
at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush, 
SegmentWriteState state)
at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush, 
SegmentWriteState state)
at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush, 
SegmentWriteState state)
at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread 
flushingDWPT)
at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit commit)
at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher 
referenceToRefresh)
at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
{code}
Issue is quite "hard-to-reproduce" and appears only when adding documents with 
the same terms concurrently. I have not managed to make a clear test that 
reproduces the issue.

I've made some research and found out that the cause of the issue are duplicate 
terms in BytesRefHash structure. BytesRefHash using the Murmurhash3_x86_32 
hashing algorithm with the random seed (see StringHelper.GOOD_FAST_HASH_SEED 
property). StringHelper.GOOD_FAST_HASH_SEED property is not thread-safe and 
could return different values if called in severeal threads in one moment, so 
it could result in duplicate values in BytesRefHash (same values return 
different hashes because hashes were calcucated with different seeds).

There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is 
used to randomize the seed, but DateTime.Now.Millisecond could return 0 and 
this value is treated an "uninitialized" and the second GOOD_FAST_HASH_SEED 
call will return another value.

The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED 
initialization to the static ctor of StringHelper. It will make it thread-safe 
and will fix 0-value issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to