DocumentsWriter.init() doesn't grow fieldDataHash array at same rate as
allFieldData array, leading to OOM errors
-----------------------------------------------------------------------------------------------------------------
Key: LUCENE-1408
URL: https://issues.apache.org/jira/browse/LUCENE-1408
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.3.2
Environment: NA
Reporter: David C. Navas
Priority: Minor
See DocumentsWriter.init() -- line 787ish
When a new field is encountered, and arrays need to be resized, the
allFieldDataArray is resized to be 50% larger, and the hashArray is resized to
be twice as large. Everytime. The hashArray grows much faster than the
fieldData array.
In addition, the fieldDataHashMask is set to be one less than the
*fieldDataArray* size, rather than the hashArray.
The latter problem obviously leads to under/bizarre utilization of the hash
array, while the former can, under circumstances where you are using an
excessive number of field columns, lead to premature OOMs (30k field columns is
something like 30 million entry placeholders in the hash array, or about 120M
per ThreadState).
Trivial fix for both would be to change *1.5 to *2, and reset the Mask based on
newHashSize, not newSize. Given you are using a mask, it looks like you want a
power of two, so you can't use *1.5 everywhere, but you could resize the hash
only when needed, rather than each time you resize the data array, though that
would be somewhat more difficult.
I made this Minor as it only affects extreme field use.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]