[
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-4196:
--------------------------------
Attachment: LUCENE-4196.patch
Hi Uwe... tricky issue.
I think in general we were doing pretty good actually: often times various
things like assert value >= 0 after a readVLong (well, thats ok to be an assert
since readVLong does a real check anyways).
But I think we were missing a few key checks that cost nothing and would detect
issues:
* when reading things like #fields or #segments, check thats not negative.
otherwise the loop does nothing and it acts like zero.
* things like docCount in the .si really need to be checked they are positive:
this file isn't checksummed (the information used to be in sis which is) and
had basically no checks.
* check that we consume entire .si file just like we do with .fnm
* the asserts for checking duplicate field names/numbers in FieldInfos and
terms dicts were a good idea, but a real check for duplicate values is
basically free anyway, since Map.put returns the previous value and its just a
null check.
* statistics are a wonderful cheap check in term dicts impls at startup...
Anyway I think this is a good start. I'll wait for review but I'd like to
commit this patch tomorrow.
> Turn asserts in I/O related code into hard checks
> -------------------------------------------------
>
> Key: LUCENE-4196
> URL: https://issues.apache.org/jira/browse/LUCENE-4196
> Project: Lucene - Core
> Issue Type: Task
> Components: core/index
> Affects Versions: 4.0-ALPHA
> Reporter: Uwe Schindler
> Fix For: 4.0
>
> Attachments: LUCENE-4196.patch
>
>
> In lots of codecs we only assert, that e.g. some things inside files are
> correctly in bounds, which leads to security problems (ok, not as bad as
> C-Style buffer overflows), but e.g. allocating a large array after reading a
> VInt from a file header and then OOM, is a security issue. So we have to
> check all those contracts for files as hard checks, especially as a simply
> check in most cases dont cost anything (and it costs not more than the assert
> itsself, as the assert also takes CPU power, because it needs a check one
> time on a static final class field).
> Of course we cannot check values we read when reading postings, but the
> simple checks that any postings file has correct header and something like a
> positive number of elements, or number of elements < file size,..., a
> bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate
> filenames (CFS) are very important. We had those checks in 3.x, but in 4.0,
> Mike changed all of those to asserts during the flex development (in my
> opinion with no real reason).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]