Whew! That's a relief. On Wed, Jun 17, 2015 at 5:05 AM, Dawid Weiss (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589681#comment-14589681 > ] > > Dawid Weiss commented on LUCENE-6576: > ------------------------------------- > > Cosmic rays and solar radiation is known to cause bit flips. We need to add > some true hardware to the test framework: > http://goo.gl/8alnw0 > >> possible index corruption with java 8u45 >> ---------------------------------------- >> >> Key: LUCENE-6576 >> URL: https://issues.apache.org/jira/browse/LUCENE-6576 >> Project: Lucene - Core >> Issue Type: Bug >> Reporter: Robert Muir >> >> Recently, I've experienced sporatic corruptions when trying to index >> wikipedia in the benchmark. I know [~mikemccand] hit similar problems in >> the nightly benchmark, and he also has an older cpu (see below for more on >> this). >> I am using this python script (compliments of mike) to index wikipedia in a >> loop, tweaked for lots of threads and heavy merging so it fails faster: >> http://pastebin.com/jwpdELDe I get corruptions constantly, though sometimes >> it takes a few iterations. >> The errors look like this, where the bytes we write "seem to be fine" but >> the CRC32 itself is maybe computed incorrectly at *write time*: >> {quote} >> Exception in thread "Thread-0" java.lang.RuntimeException: >> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware >> problem?) : expected=e2b2d8f5 actual=a04da0c >> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim"))) >> at perf.IndexThreads$IndexThread.run(IndexThreads.java:402) >> Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed >> (hardware problem?) : expected=e2b2d8f5 actual=a04da0c >> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim"))) >> {quote} >> This happens with different file extensions (.tip, .tim, .pos, .doc, .dvd, >> ...). Whenever one of these corrupted files was included in a commit point, >> I've run "the rest of CheckIndex" minus the CRC32 check and it always >> passes: but that is no guarantee thats what is happening. >> I think maybe the bugs are for some reason, easier to reproduce on my CPU, >> maybe because its older and only has AVX1, or some other reason: >> {quote} >> model : 42 >> model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx >> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology >> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est >> tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes >> xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi >> flexpriority ept vpid >> {quote} >> Other notes: >> * does not need multiple threads. I did this to make the "test" fail faster. >> It will fail sometimes with maxBufferedDocs + SerialMergeScheduler + 1 >> thread, which is deterministic. >> * have not tested JDK9 in any way, might be some already-fixed bug. >> * I've run numerous hardware tests: memory, disk, etc. >> * I've run the tests with two different SSD drives: both fail. >> First step: clean up this script and make it so it can be reproduced on >> other hardware. I can try on my laptop as well. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
