Aaron McCurry created BLUR-234:
----------------------------------
Summary: Create a softlink like capability in the HDFSDirectory
Key: BLUR-234
URL: https://issues.apache.org/jira/browse/BLUR-234
Project: Apache Blur
Issue Type: Sub-task
Components: Blur
Affects Versions: 0.3.0
Reporter: Aaron McCurry
Fix For: 0.3.0
The problem we are trying to solve here is minimizing file copying. During a
merge of an external index produced by MR into a shard index normally the index
files are copied. In a lot of cases the new external index(es) are very large.
This can cause some serious performance problems because all the new data
would be copied into shard index. Normally this can happens across the cluster
at the same time so it will likely turn into an IO storm.
The current implementation in the IndexImporter that deals with this problem
does so by overriding method in the HDFSDirectory that moves the files in HDFS
instead of copying. This makes those merges very fast, but it's risky because
if the shard index writer doesn't commit the changes the files are not moved
back to their original location. Instead they are deleted, loss of data.
So I'm preposing that we create a softlink system that allows for links to the
be created instead of being moved. That way if the commit fails the links are
removed and the original data files are in the their original location.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira