[jira] [Commented] (BLUR-439) HDFSDirectory fencing issue

Aaron McCurry (JIRA) Tue, 16 Jun 2015 05:39:20 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587961#comment-14587961
 ]


Aaron McCurry commented on BLUR-439:
------------------------------------

Created a test that shows the problem.

https://gist.github.com/amccurry/14d5c2e1918891674777

> HDFSDirectory fencing issue
> ---------------------------
>
>                 Key: BLUR-439
>                 URL: https://issues.apache.org/jira/browse/BLUR-439
>             Project: Apache Blur
>          Issue Type: Bug
>          Components: Blur
>    Affects Versions: 0.2.4
>            Reporter: Aaron McCurry
>            Priority: Blocker
>             Fix For: 0.2.4
>
>
> We recently had and issue that created a corrupt index.
> What happened?
> Shard Server 1 (SS1) owned a shard of a table (SH1) and was performing an 
> index import when a layout change in the table occurred mid import.  The 
> segment version of this shard was at "_0".  The SS1 server performed the work 
> of the import by adding in files from an external directory.  At this point 
> the SH1 has not been committed and so the current committed version is still 
> "_0".  Although the files from the next segment version "_1" have been 
> written and not committed yet.
> Then SH1 shard moved to another shard server (SS2) and once open the version 
> that was open was also "_0" which is correct.  During the move the directory 
> lock is now owned by SS2 which is also correct.  The SS2 process started the 
> import process again for the external directory that was not committed.  It 
> also writes new files for the "_1" segment.
> Now back on the SS1 server, the commit is underway and the directory lock is 
> checked and an exception is thrown because this process no longer owns the 
> lock.  During the rollback the SS1 server deletes what is thinks are the "_1" 
> segments that it wrote but the files are actually from the SS2 import 
> process.  Once the files are deleted the abort and rollback is complete and 
> the index has returned to it's "_0" state.
> However on the SS2 server the commit is moving forward for the "_1" segment 
> (which now the files have been deleted by the SS1) and the index is corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BLUR-439) HDFSDirectory fencing issue

Reply via email to