[ 
https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640224#action_12640224
 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I am almost certain that it won't affect any benchmark other than NNBench.

If that's really the case, what are we worried about here?

> What happens if a link or directory is changed between these two operations? 
> open() fails though it should not.

The same thing that happens if that change is made just after a file is opened. 
 If you open a file, then someone else deletes it, subsequent accesses to that 
file will fail.  The namenode doesn't keep any state for files open for read, 
so a short-lived cache of block locations doesn't change things fundamentally.

That said, the cache idea only works for open, and doesn't work for rename, 
delete, etc.  In these cases we don't want to pre-fetch a list of block 
locations.  So nevermind the cache idea anyway.

The current options on the table seem to be:
 - Dhruba's patch modified to use Nicholas's idea of LinkResult<T> style, to 
avoid defining new return type classes for the SPI methods.
 - A less-invasive approach that requires two RPCs.  We may later optimize this 
by converting FileSystem's API to use the above style, but we may not need to.  
We do need to be careful not to incompatibly change FileSystem's public API, 
but the SPI's not so constrained, since all FileSystem implementations are 
currently in trunk and can be easily maintained in a coordinated manner.  In 
the meantime, we can start using symbolic links in archives, etc. while we work 
out if and how to better optimize them.

Does that sound right?

I don't have a strong preference.  If I were implementing it myself I'd 
probably go for the simple approach first, early in a release cycle, then 
benchmark things and optimize it subsequently if needed.  The risk is not that 
great, since we already have good ideas of how to optimize it.  But the 
optimization will clearly help scalability, so it wouldn't hurt to have it from 
the outset either.

FYI, I tried implementing my patch above as a LinkedFileSystem subclass, to 
better contain the changes.  This turned out to be messy, since a 
LinkedFileSystem can link to an unlinked FileSystem.  With the subclass 
approach this must be explicitly handled by casts and 'instanceof', while when 
FileSystem itself supports links this can be handled by default method 
implementations.  So I am not convinced that a LinkedFileSystem subclass is a 
net win.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, 
> symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, 
> symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to