[ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641174#action_12641174 ]
Sanjay Radia commented on HADOOP-4044: -------------------------------------- > I would like to avoid a design that incurs an overhead of an additional RPC > everytime a link is traversed. >+1. This will affect not only NNBench but all benchmarks including DFSIO and >especially NNThroughputBenchmark. >GridMix and Sort will probably be less affected, but will suffer too. +1. I would also like to avoid an extra rpc, since avoiding one is straight forward. Doug >What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations. Clearly your cache solution deals with the extra RPC issue. Generally I see a cache as a way of improving the performance of an ordinarily good design or algorithm. I don't like the use of caches as part of a design to make an algorithm work when alternate good designs are available that don't need a cache. Would we have come up with this design if we didn't have such an emotionally charged discussion on exceptions? We have a good design where if the resolution fails due to a symlink, we return this information to the caller. It does not require the use of a cache. We are divided over how to return this information - use the return status or use an exception. The cache solution is a way to avoid making the painfully emotionally charged decision for the Hadoop community. I don't want to explain the reason we use the cache to hadoop developers again and again down the road. We should not avoid the decision, but make it. A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that the same is true now. > Create symbolic links in HDFS > ----------------------------- > > Key: HADOOP-4044 > URL: https://issues.apache.org/jira/browse/HADOOP-4044 > Project: Hadoop Core > Issue Type: New Feature > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: HADOOP-4044-strawman.patch, symLink1.patch, > symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, > symLink8.patch, symLink9.patch > > > HDFS should support symbolic links. A symbolic link is a special type of file > that contains a reference to another file or directory in the form of an > absolute or relative path and that affects pathname resolution. Programs > which read or write to files named by a symbolic link will behave as if > operating directly on the target file. However, archiving utilities can > handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.