mikemccand commented on issue #15552: URL: https://github.com/apache/lucene/issues/15552#issuecomment-3773216676
> Also related to this, at some point we should double-check the math. IIRC these algorithms are only appropriate for file sizes up to some certain limit (e.g. some number of GB). Otherwise they may not detect problems reliably. I tried to [ask Claude Opus 4.5 about this](https://claude.ai/share/91bf5b0b-3d76-4fbc-aad4-4893cc0168e8) but its response is confusing :) I think the problem is "false negative" risk meaning bit(s) flipped but the bit-flipped file has the same checksum as the original file ("checksum collision") so the error goes undetected. The risk of this is ~1 in ~4.3 billion (2 ^ 32) assuming your bit flips have no accidental (or adversarial -- this is not a secure/crypto hash?) correlation with CRC32's collisions. But this is a risk per checksum+validation right, not "per GB of file you are checksumming" or so? So, as long as Lucene isn't using billions of files in an index, the risk remains lowish for any single Lucene user? So, of the billions of Lucene users ;) (well, individual index files written/read times all Lucene usage integrated over time since we added checksums), some have probably hit this false negative! Oh wait, the universe is smaller -- it's only those segment files written with a bit-flipper in the path? Still, for such users, it's likely their bit-flipper (wherever it is -- RAM, bus, storage, CPU cache lines) will still be detected even if they unluckily hit the jackpot once (checkpoint collision). Weird/scary/hard to think about... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
