Mike, thank you! So eventually this amount of data must stay entirely in RAM (as postings) before flushing to disk? Can it be hacked?)
The documents themselves (that I will deliver to user) are of a regular size, but features that I generate grow combinatorially in size and blow the index up in some sense. I definitely want to think about breaking them into pieces, thank you for the advice! -- Best Regards, Igor Shalyminov 21.02.2014, 00:50, "Michael McCandless" <[email protected]>: > Yes, in 4.x IndexWriter now takes an Iterable that enumerates the > fields one at a time. > > You can also pass a Reader to a Field. > > That said, there will still be massive RAM required by IW to hold the > inverted postings for that one document, likely much more RAM than the > original document's String contents. > > And, such huge documents are rarely useful in practice. E.g., how > will you "deliver" that hit to the end user at search time? Will > scores actually make sense for such enormous documents? It's better > to break them up into more manageable sizes. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Feb 20, 2014 at 3:22 PM, Igor Shalyminov > <[email protected]> wrote: > >> Hello! >> >> I'va faced a problem of indexing huge documents. The indexing itself goes >> allright, but when the document processing becomes concurrent, OutOfMemories >> start appearing (even with heap of about 32GB). >> The issue, as I see it, is that I have to create a Document instance to >> send it to IndexWriter, and Document is just a collection of all the fields, >> all in RAM. >> With my huge fields, it would be so much better to have the ability of >> sending document fields for writing one by one, keeping no more than a >> single field in RAM. >> Is it possible in the latest Lucene? >> >> -- >> Best Regards, >> Igor Shalyminov >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
