Whew! That's a relief.

On Wed, Jun 17, 2015 at 5:05 AM, Dawid Weiss (JIRA) <[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589681#comment-14589681
>  ]
>
> Dawid Weiss commented on LUCENE-6576:
> -------------------------------------
>
> Cosmic rays and solar radiation is known to cause bit flips. We need to add 
> some true hardware to the test framework:
> http://goo.gl/8alnw0
>
>> possible index corruption with java 8u45
>> ----------------------------------------
>>
>>                 Key: LUCENE-6576
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-6576
>>             Project: Lucene - Core
>>          Issue Type: Bug
>>            Reporter: Robert Muir
>>
>> Recently, I've experienced sporatic corruptions when trying to index 
>> wikipedia in the benchmark. I know  [~mikemccand] hit similar problems in 
>> the nightly benchmark, and he also has an older cpu (see below for more on 
>> this).
>> I am using this python script (compliments of mike) to index wikipedia in a 
>> loop, tweaked for lots of threads and heavy merging so it fails faster: 
>> http://pastebin.com/jwpdELDe I get corruptions constantly, though sometimes 
>> it takes a few iterations.
>> The errors look like this, where the bytes we write "seem to be fine" but 
>> the CRC32 itself is maybe computed incorrectly at *write time*:
>> {quote}
>> Exception in thread "Thread-0" java.lang.RuntimeException: 
>> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware 
>> problem?) : expected=e2b2d8f5 actual=a04da0c 
>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim")))
>>       at perf.IndexThreads$IndexThread.run(IndexThreads.java:402)
>> Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
>> (hardware problem?) : expected=e2b2d8f5 actual=a04da0c 
>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim")))
>> {quote}
>> This happens with different file extensions (.tip, .tim, .pos, .doc, .dvd, 
>> ...). Whenever one of these corrupted files was included in a commit point, 
>> I've run "the rest of CheckIndex" minus the CRC32 check and it always 
>> passes: but that is no guarantee thats what is happening.
>> I think maybe the bugs are for some reason, easier to reproduce on my CPU, 
>> maybe because its older and only has AVX1, or some other reason:
>> {quote}
>> model         : 42
>> model name    : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
>> flags         : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
>> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
>> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est 
>> tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes 
>> xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi 
>> flexpriority ept vpid
>> {quote}
>> Other notes:
>> * does not need multiple threads. I did this to make the "test" fail faster. 
>> It will fail sometimes with maxBufferedDocs + SerialMergeScheduler + 1 
>> thread, which is deterministic.
>> * have not tested JDK9 in any way, might be some already-fixed bug.
>> * I've run numerous hardware tests: memory, disk, etc.
>> * I've run the tests with two different SSD drives: both fail.
>> First step: clean up this script and make it so it can be reproduced on 
>> other hardware. I can try on my laptop as well.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to