[ 
https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700905#action_12700905
 ] 

Edward Capriolo commented on HADOOP-4044:
-----------------------------------------

I am using/helping the hadoop-hive subproject. I wanted to share a use case for 
symlinks.

For example suppose a directory inside hadoop:
/user/edward/weblogs/{web1.log,web2.log,web3.log}. I can use a Hive EXTERNAL
table to point to the parent directory. I can then use Hive to query this 
external table. This is very powerful. This will work unless another file in 
this directory with a different format is also in the directory 
web_logsummary.csv. (this is my case)

Being able to drop in a 'symlink' where a file would go could be used to create 
structures from already existing data. Imagine a user that has a large hadoop 
deployment and is wishing to migrate/ start using hive. External table is 
constrained to one directory. They would need to recode application paths and 
or move files. If you had a 'symlink' concept anyone can start using hive 
without re-organizing or copying data.

Right now, hive has a lot of facilities to deal with input formats, such as 
specifying delimiters etc, but forcing the data either into a warehouse or into 
an external table is limiting. 'Symlinks' tied together with hive's current 
input format capabilities would make hive more versatile. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, 
> symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, 
> symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, 
> symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to