On Thu, Aug 26, 2010 at 12:06 PM, Eric Pugh
<[email protected]> wrote:
> Under "Factors affecting memory usage" there is this text:
>
> When processing an "add" command for a document, the standard XML update 
> handler has two limitations:
>
>        • All of the document's fields must simultaneously fit into memory. 
> (Technically, it's actually the sum of min(<the actual field value's length>, 
> maxFieldLength). As such, adjusting maxFieldLength may be of some help.)
>                • (I'm assuming that fields are truncated to maxFieldLength 
> before being added to the relevant document object. If that's not true, then 
> maxFieldLength won't help here. --ChrisHarris)
>        • Each individual <field>...</field> tag in the input XML must fit 
> into memory, regardless of maxFieldLength.
>
>
> Bullet 1 contradicts bullet 2, at least, the way I read it.
>
> Looking at the tokenizer that applies the maxFieldLength cutoff, it is 
> working with a stream...  That implies that the first bullet is correct, and 
> that the entire XML document doesn't need to fit into memory.  Unless what we 
> are trying to say is that to parse the incoming XML document, the entire 
> document must fit into memory?  After that, the tokenizer kicks in and only 
> the min(<the actual field value's length>, maxFieldLength) applies to each 
> field...?


I think your understanding is correct: maxFieldLength has little to do
with memory use per-se - it's the max number of tokens indexed for any
given field in a document.  Of course cutting down the maxFieldLength
will cut down on what lucene internally stores before flushing a
segment too... but I imagine that's going to be irrelevant to 99.9% of
our users.

Maybe this whole thing should be cut down to "All of the document's
fields must currently simultaneously fit into memory.", if it's even
worth mentioning it at all.  Can you clean this up Eric?

-Yonik
http://lucenerevolution.org   Lucene/Solr Conference, Boston Oct 7-8

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to