[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846960#action_12846960
 ] 

Michael McCandless commented on LUCENE-2328:
--------------------------------------------

{quote}
bq. Keeping track of not-yet-sync'd files instead of sync'd files is better, 
but it still requires upkeep (ie when file is deleted you have to remove it) 
because files can be opened, written to, closed, deleted without ever being 
sync'd.
You can just skip this and handle FileNotFound exception when syncing. Have to 
handle it anyway, no guarantees some file won't be snatched from under your 
nose.
{quote}

IW & IR do in fact guarantee they will never ask for a deleted file to
be sync'd.  If they ever do that we have more serious problems ;)

{quote}
bq. This will over-sync in some situations.
Don't feel this is a serious problem. If you over-sync (in fact sync some files 
a little bit earlier than strictly required), in a few seconds you will 
under-sync, so total time is still the same.
{quote}

I think this is important -- commit is already slow enough -- why make
it slower?

Further, the extra files you sync'd may never have needed to be sync'd
(they will be merged away).  My examples above include such cases.

Turning this around... what's so bad about keeping the sync per file?

bq. System-wide sync is not the original aim, it's just a possible byproduct of 
what is the original aim

I know this is not the aim of this issue, rather just a nice
by-product if we switch to a "global sync" method.

bq. to move sync tracking code from IW to Directory.

Right this is a great step forward, as long as long as we don't slow
commit by dumbing down the API :)

bq. And I don't see at all how adding batch-syncs achieves this.

You're right: this doesn't achieve / is not required for "moving
sync'd file tracking" down to Dir.  It's orthogonal, but, is another
way that we could allow Dir impls to do global sync.

I'm proposing this as a different change, to make the API better match
the needs of its consumers.  In fact, really the OS ought to allow for
this as well (but I know of none that do) since it'd give the IO
scheduler more freedom on which bytes need to be moved to disk.

We can open this one as a separate issue...


> IndexWriter.synced  field accumulates data leading to a Memory Leak
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2328
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2328
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
>         Environment: all
>            Reporter: Gregor Kaczor
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to