Mike, thank you! So eventually this amount of data must stay entirely in RAM (as postings) before flushing to disk? Can it be hacked?)
The documents themselves (that I will deliver to user) are of a regular size, but features that I generate grow combinatorially in size and blow the index up in some sense. I definitely want to think about breaking them into pieces, thank you for the advice! -- Best Regards, Igor Shalyminov 21.02.2014, 00:50, "Michael McCandless" <luc...@mikemccandless.com>: > Yes, in 4.x IndexWriter now takes an Iterable that enumerates the > fields one at a time. > > You can also pass a Reader to a Field. > > That said, there will still be massive RAM required by IW to hold the > inverted postings for that one document, likely much more RAM than the > original document's String contents. > > And, such huge documents are rarely useful in practice. E.g., how > will you "deliver" that hit to the end user at search time? Will > scores actually make sense for such enormous documents? It's better > to break them up into more manageable sizes. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov > <ishalymi...@yandex-team.ru> wrote: > >> Hello! >> >> I'va faced a problem of indexing huge documents. The indexing itself goes >> allright, but when the document processing becomes concurrent, OutOfMemories >> start appearing (even with heap of about 32GB). >> The issue, as I see it, is that I have to create a Document instance to >> send it to IndexWriter, and Document is just a collection of all the fields, >> all in RAM. >> With my huge fields, it would be so much better to have the ability of >> sending document fields for writing one by one, keeping no more than a >> single field in RAM. >> Is it possible in the latest Lucene? >> >> -- >> Best Regards, >> Igor Shalyminov >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org