Mike, thank you!

So eventually this amount of data must stay entirely in RAM (as postings) 
before flushing to disk?
Can it be hacked?)

The documents themselves (that I will deliver to user) are of a regular size, 
but features that I generate grow combinatorially in size and blow the index up 
in some sense.
I definitely want to think about breaking them into pieces, thank you for the 
advice!
 

--
Best Regards,
Igor Shalyminov


21.02.2014, 00:50, "Michael McCandless" <luc...@mikemccandless.com>:
> Yes, in 4.x IndexWriter now takes an Iterable that enumerates the
> fields one at a time.
>
> You can also pass a Reader to a Field.
>
> That said, there will still be massive RAM required by IW to hold the
> inverted postings for that one document, likely much more RAM than the
> original document's String contents.
>
> And, such huge documents are rarely useful in practice.  E.g., how
> will you "deliver" that hit to the end user at search time?  Will
> scores actually make sense for such enormous documents?  It's better
> to break them up into more manageable sizes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov
> <ishalymi...@yandex-team.ru> wrote:
>
>>  Hello!
>>
>>  I'va faced a problem of indexing huge documents. The indexing itself goes 
>> allright, but when the document processing becomes concurrent, OutOfMemories 
>> start appearing (even with heap of about 32GB).
>>  The issue, as I see it, is that I have to create a Document instance to 
>> send it to IndexWriter, and Document is just a collection of all the fields, 
>> all in RAM.
>>  With my huge fields, it would be so much better to have the ability of 
>> sending document fields for writing one by one, keeping no more than a 
>> single field in RAM.
>>  Is it possible in the latest Lucene?
>>
>>  --
>>  Best Regards,
>>  Igor Shalyminov
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>  For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to