[
https://issues.apache.org/jira/browse/BLUR-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587961#comment-14587961
]
Aaron McCurry commented on BLUR-439:
------------------------------------
Created a test that shows the problem.
https://gist.github.com/amccurry/14d5c2e1918891674777
> HDFSDirectory fencing issue
> ---------------------------
>
> Key: BLUR-439
> URL: https://issues.apache.org/jira/browse/BLUR-439
> Project: Apache Blur
> Issue Type: Bug
> Components: Blur
> Affects Versions: 0.2.4
> Reporter: Aaron McCurry
> Priority: Blocker
> Fix For: 0.2.4
>
>
> We recently had and issue that created a corrupt index.
> What happened?
> Shard Server 1 (SS1) owned a shard of a table (SH1) and was performing an
> index import when a layout change in the table occurred mid import. The
> segment version of this shard was at "_0". The SS1 server performed the work
> of the import by adding in files from an external directory. At this point
> the SH1 has not been committed and so the current committed version is still
> "_0". Although the files from the next segment version "_1" have been
> written and not committed yet.
> Then SH1 shard moved to another shard server (SS2) and once open the version
> that was open was also "_0" which is correct. During the move the directory
> lock is now owned by SS2 which is also correct. The SS2 process started the
> import process again for the external directory that was not committed. It
> also writes new files for the "_1" segment.
> Now back on the SS1 server, the commit is underway and the directory lock is
> checked and an exception is thrown because this process no longer owns the
> lock. During the rollback the SS1 server deletes what is thinks are the "_1"
> segments that it wrote but the files are actually from the SS2 import
> process. Once the files are deleted the abort and rollback is complete and
> the index has returned to it's "_0" state.
> However on the SS2 server the commit is moving forward for the "_1" segment
> (which now the files have been deleted by the SS1) and the index is corrupted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)