[ 
https://issues.apache.org/jira/browse/LUCENE-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603737#comment-13603737
 ] 

Jessica Cheng commented on LUCENE-4707:
---------------------------------------

Hi Michael,

I did what you suggested, but I ran into a related problem involving a race 
between the merge and the reader being returned so that I can protect the 
reference. In the log trace below, the thread that's executing 
IndexWriter.getReader gets stalled when maybeMerge is called, at which point 
the Lucene Merge Thread came in and deleted the files referred to the 
segmentInfos that the getReader call has already cloned, but since getReader 
has not returned yet, those files were not protected (incRef'ed) yet and the 
Lucene Merge Thread was able to delete the files. (I'm guessing in this case 
the file was created and merged within a softCommit cycle so the previous NRT 
reader/searcher never had a reference to it.)

Questions:
1. What's my best way to get around that?
2. How does the OS-level file protection help in this case since the 
segmentInfos are just clone()ed in getReader and the call seems to just copy 
around references and never registered in any way with the directory?

Thanks so much for your help again.

Log:
BD 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: applyDeletes: 
infos=[_41(4.1):C2006, _9p(4.1):C17686, _3s(4.1):C1163, _4d(4.1):c313, 
_3y(4.1):c365, _4b(4.1):c423, _4a(4.1):C881, _4c(4.1):c54, _4f(4.1):c186, 
_4e(4.1):c30, _4g(4.1):c3, _ao(4.1):C3734, _ch(4.1):C3464, _d1(4.1):c708, 
_dk(4.1):c269, _dh(4.1):c36, _di(4.1):c4, _dj(4.1):c47, _dm(4.1):c3, 
_dl(4.1):c1, _dn(4.1):c1, _do(4.1):c1, _dp(4.1):c1, _dq(4.1):c49, _dr(4.1):c15, 
_ds(4.1):c1, _du(4.1):c101, _dv(4.1):c27, _dw(4.1):c3, _dx(4.1):c1, 
_dy(4.1):c1, _e0(4.1):c1, _dz(4.1):c1] packetCount=1\
...
BD 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: seg=_d1(4.1):c708 
segGen=763 coalesced deletes=[CoalescedDeletes(termSets=1,queries=0)] 
newDelCount=0\
...
IW 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: return reader 
version=1013 reader=StandardDirectoryReader(segments_3:1013:nrt _41(4.1):C2006 
_9p(4.1):C17686 _3s(4.1):C1163 _4d(4.1):c313 _3y(4.1):c365 _4b(4.1):c423 
_4a(4.1):C881 _4c(4.1):c54 _4f(4.1):c186 _4e(4.1):c30 _4g(4.1):c3 
_ao(4.1):C3734 _ch(4.1):C3464 _d1(4.1):c708 _dk(4.1):c269 _dh(4.1):c36 
_di(4.1):c4 _dj(4.1):c47 _dm(4.1):c3 _dl(4.1):c1 _dn(4.1):c1 _do(4.1):c1 
_dp(4.1):c1 _dq(4.1):c49 _dr(4.1):c15 _ds(4.1):c1 _du(4.1):c101 _dv(4.1):c27 
_dw(4.1):c3 _dx(4.1):c1 _dy(4.1):c1 _e0(4.1):c1 _dz(4.1):c1)\
DW 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: commitScheduler-333 
finishFullFlush success=true\
TMP 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: findMerges: 33 
segments\
...
TMP 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]:   
seg=_d1(4.1):c708 size=1.144 MB [merging] [floored]\
…
IW 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]: registerMerge 
merging= [_4c, _do, _dl, _4f, _41, _dn, _4g, _4d, _di, _4a, _4b, _dh, _3y, _dp
, _4e, _dk, _dj, _d1, _dm, _3s, ]\
...
CMS 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]:   index: 
_41(4.1):C2006 _9p(4.1):C17686 _3s(4.1):C1163 _4d(4.1):c313 _3y(4.1):c365 
_4b(4.1):c423 _4a(4.1):C881 _4c(4.1):c54 _4f(4.1):c186 _4e(4.1):c30 _4g(4.1):c3 
_ao(4.1):C3734 _ch(4.1):C3464 _d1(4.1):c708 _dk(4.1):c269 _dh(4.1):c36 
_di(4.1):c4 _dj(4.1):c47 _dm(4.1):c3 _dl(4.1):c1 _dn(4.1):c1 _do(4.1):c1 
_dp(4.1):c1 _dq(4.1):c49 _dr(4.1):c15 _ds(4.1):c1 _du(4.1):c101 _dv(4.1):c27 
_dw(4.1):c3 _dx(4.1):c1 _dy(4.1):c1 _e0(4.1):c1 _dz(4.1):c1\
CMS 129 [Thu Mar 14 17:41:23 PDT 2013; commitScheduler-333]:     too many 
merges; stalling...\
IFD 129 [Thu Mar 14 17:41:24 PDT 2013; Lucene Merge Thread #20]:   DecRef 
"_d1.cfs": pre-decr count is 1\
IFD 129 [Thu Mar 14 17:41:24 PDT 2013; Lucene Merge Thread #20]: delete 
"_d1.cfs"\
...
...at this point commitScheduler-333 tries to incRef _d1.cfs but it's too late.
                
> Track file reference kept by readers that are opened through the writer
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-4707
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4707
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>         Environment: Mac OS X 10.8.2 and Linux 2.6.32
>            Reporter: Jessica Cheng
>
> We ran into a bug where files (mostly CFS) that are still referred to by our 
> NRT reader/searcher are deleted by IndexFileDeleter. As far as I can see from 
> the verbose logging and reading the code, it seems that the problem is the 
> creation and merging of these CFS files between hard commits. The files 
> referred to by hard commits are incRef’ed at commit checkpoints, so these 
> files won’t be deleted until they are decRef’ed when the commit is deleted 
> according to the DeletionPolicy (good). However, intermediate files that are 
> created and merged between the hard commits only have refs through the 
> regular checkpoints, so as soon as a new checkpoint no longer includes those 
> files, they are immediately deleted by the deleter. See the abridged verbose 
> log lines that illustrate this behavior:
> IW 11 [Mon Jan 21 17:30:35 PST 2013; commitScheduler]: create compound file 
> _8.cfs
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5_6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6" [9 segments ; 
> isCommit = false]
> IFD 7 [Mon Jan 21 17:23:41 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 0
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]: now checkpoint 
> "_0(4.0.0.2):C3_1(4.0.0.2):C7 _2(4.0.0.2):C16 _3(4.0.0.2):C21 _4(4.0.0.2):C5 
> _5(4.0.0.2):C5 _6(4.0.0.2):C5 _7(4.0.0.2):C7 _8(4.0.0.2):c6 _9(4.0.0.2):c6" 
> [10 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   IncRef "_8.cfs": 
> pre-incr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; commitScheduler]:   DecRef "_8.cfs": 
> pre-decr count is 2
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: now checkpoint 
> "_b(4.0.0.2):C81" [1 segments ; isCommit = false]
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]:   DecRef 
> "_8.cfs": pre-decr count is 1
> IFD 7 [Mon Jan 21 17:23:42 PST 2013; Lucene Merge Thread #0]: delete "_8.cfs"
> With this behavior, it seems no matter how frequently we refresh the reader 
> (unless we do it at every read), we’d run into the race where the reader 
> still holds a reference to the file that’s just been deleted by the deleter. 
> My proposal is to count the file reference handed out to the NRT 
> reader/searcher when writer.getReader(boolean) is called and decRef the files 
> only when the said reader is closed.
> Please take a look and evaluate if my observations are correct and if the 
> proposal makes sense. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to