[ 
https://issues.apache.org/jira/browse/BLUR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796632#comment-13796632
 ] 

Aaron McCurry commented on BLUR-234:
------------------------------------

Great!  Thanks!

There is a partially built one in the blur-store already.  Needs more work 
though.

Aaron

> Create a softlink like capability in the HDFSDirectory
> ------------------------------------------------------
>
>                 Key: BLUR-234
>                 URL: https://issues.apache.org/jira/browse/BLUR-234
>             Project: Apache Blur
>          Issue Type: Sub-task
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
>
>
> The problem we are trying to solve here is minimizing file copying.  During a 
> merge of an external index produced by MR into a shard index normally the 
> index files are copied.  In a lot of cases the new external index(es) are 
> very large.  This can cause some serious performance problems because all the 
> new data would be copied into shard index.  Normally this can happens across 
> the cluster at the same time so it will likely turn into an IO storm.
> The current implementation in the IndexImporter that deals with this problem 
> does so by overriding method in the HDFSDirectory that moves the files in 
> HDFS instead of copying.  This makes those merges very fast, but it's risky 
> because if the shard index writer doesn't commit the changes the files are 
> not moved back to their original location.  Instead they are deleted, loss of 
> data.
> So I'm preposing that we create a softlink system that allows for links to 
> the be created instead of being moved.  That way if the commit fails the 
> links are removed and the original data files are in the their original 
> location.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to