Re: Discussions elsewhere about fsync, SortWriter memory costs

Nathan Kurz Sat, 19 Dec 2009 12:37:11 -0800

On Thu, Dec 17, 2009 at 6:38 PM, Marvin Humphrey <[email protected]> wrote:
> In the JIRA issue LUCENE-2026, Mike McCandless and I have been talking about
> how and when to protect Lucy against power-failure-induced index corruption.
>
>  https://issues.apache.org/jira/browse/LUCENE-2026


I think your approach here makes great sense:  you can't prevent data
corruption, you just want to reduce the chance of it happening to an
acceptable level.  Thinking about how you could add an external log
file seems like a better failsafe than trying to do full 'commits'
within the writer, seeing as there is no guarantee those commits will
actually hit disk.

I also think that Mike is making too much distinction between "relying
on the file system" and "using shared memory".  I think one can safely
view them as two interfaces to the same underlying mechanism.

> Over on the Lucene java-user list, we're discussing how to rein in
> SortWriter's memory costs:
>
>  http://markmail.org/message/zcgb6ak24di42rev

I'm not sure that I understand your approach here.  Could you offer
some context?  The plan is to store one giant file of all the field
values, and then simple lists of integer Doc values in sorted order?
I'd wonder how you'd merge results between shards or sort the results
of text query.   What's the main use case you are targetting?

--nate

Re: Discussions elsewhere about fsync, SortWriter memory costs

Reply via email to