Yes, all postings for the entire doc are held in RAM data structures ... you could make your own indexing chain to somehow change this behavior, but I don't think that's an easy task.
Mike McCandless http://blog.mikemccandless.com On Thu, Feb 20, 2014 at 4:02 PM, Igor Shalyminov <ishalymi...@yandex-team.ru> wrote: > Mike, thank you! > > So eventually this amount of data must stay entirely in RAM (as postings) > before flushing to disk? > Can it be hacked?) > > The documents themselves (that I will deliver to user) are of a regular size, > but features that I generate grow combinatorially in size and blow the index > up in some sense. > I definitely want to think about breaking them into pieces, thank you for the > advice! > > > -- > Best Regards, > Igor Shalyminov > > > 21.02.2014, 00:50, "Michael McCandless" <luc...@mikemccandless.com>: >> Yes, in 4.x IndexWriter now takes an Iterable that enumerates the >> fields one at a time. >> >> You can also pass a Reader to a Field. >> >> That said, there will still be massive RAM required by IW to hold the >> inverted postings for that one document, likely much more RAM than the >> original document's String contents. >> >> And, such huge documents are rarely useful in practice. E.g., how >> will you "deliver" that hit to the end user at search time? Will >> scores actually make sense for such enormous documents? It's better >> to break them up into more manageable sizes. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov >> <ishalymi...@yandex-team.ru> wrote: >> >>> Hello! >>> >>> I'va faced a problem of indexing huge documents. The indexing itself goes >>> allright, but when the document processing becomes concurrent, >>> OutOfMemories start appearing (even with heap of about 32GB). >>> The issue, as I see it, is that I have to create a Document instance to >>> send it to IndexWriter, and Document is just a collection of all the >>> fields, all in RAM. >>> With my huge fields, it would be so much better to have the ability of >>> sending document fields for writing one by one, keeping no more than a >>> single field in RAM. >>> Is it possible in the latest Lucene? >>> >>> -- >>> Best Regards, >>> Igor Shalyminov >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org