[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972767#action_12972767
 ] 

Michael McCandless commented on LUCENE-2814:
--------------------------------------------

Patch looks great!  Nice work Earwin.  I think it's ready to commit.

Except, can you resync to trunk?  I hit failures applying one hunk to
DW.java.

Also, on the nocommit on exc in DW.addDocument, yes I think that
(IFD.deleteNewFiles, not checkpoint) is still needed because DW can
orphan the store files on abort?

Or: we could fix DW.abort to directly call Dir.deleteFile (instead of
relying on IFD.deleteNewFiles).  Ie, w/ no shared doc stores, these
files should never have been registered w/ IFD so they can be
privately managed by DW.

But, if we end up leaving the delete up above, we should put the
docWriter null check back so silly apps that close IW while still
indexing don't get NPEs.

I'm not looking forward to the 3.x back port!!


> stop writing shared doc stores across segments
> ----------------------------------------------
>
>                 Key: LUCENE-2814
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2814
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
> LUCENE-2814.patch
>
>
> Shared doc stores enables the files for stored fields and term vectors to be 
> shared across multiple segments.  We've had this optimization since 2.1 I 
> think.
> It works best against a new index, where you open an IW, add lots of docs, 
> and then close it.  In that case all of the written segments will reference 
> slices a single shared doc store segment.
> This was a good optimization because it means we never need to merge these 
> files.  But, when you open another IW on that index, it writes a new set of 
> doc stores, and then whenever merges take place across doc stores, they must 
> now be merged.
> However, since we switched to shared doc stores, there have been two 
> optimizations for merging the stores.  First, we now bulk-copy the bytes in 
> these files if the field name/number assignment is "congruent".  Second, we 
> now force congruent field name/number mapping in IndexWriter.  This means 
> this optimization is much less potent than it used to be.
> Furthermore, the optimization adds *a lot* of hair to 
> IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
> time, and causes odd behavior like a merge possibly forcing a flush when it 
> starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
> flushing, we can no longer share doc stores.
> So, I think we should turn off the write-side of shared doc stores to pave 
> the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
> reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to