[
https://issues.apache.org/jira/browse/BLUR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805856#comment-13805856
]
Vikrant Navalgund commented on BLUR-234:
----------------------------------------
Thank you Aaron.
> Create a softlink like capability in the HDFSDirectory
> ------------------------------------------------------
>
> Key: BLUR-234
> URL: https://issues.apache.org/jira/browse/BLUR-234
> Project: Apache Blur
> Issue Type: Sub-task
> Components: Blur
> Affects Versions: 0.3.0
> Reporter: Aaron McCurry
> Fix For: 0.3.0
>
>
> The problem we are trying to solve here is minimizing file copying. During a
> merge of an external index produced by MR into a shard index normally the
> index files are copied. In a lot of cases the new external index(es) are
> very large. This can cause some serious performance problems because all the
> new data would be copied into shard index. Normally this can happens across
> the cluster at the same time so it will likely turn into an IO storm.
> The current implementation in the IndexImporter that deals with this problem
> does so by overriding method in the HDFSDirectory that moves the files in
> HDFS instead of copying. This makes those merges very fast, but it's risky
> because if the shard index writer doesn't commit the changes the files are
> not moved back to their original location. Instead they are deleted, loss of
> data.
> So I'm preposing that we create a softlink system that allows for links to
> the be created instead of being moved. That way if the commit fails the
> links are removed and the original data files are in the their original
> location.
--
This message was sent by Atlassian JIRA
(v6.1#6144)