[
https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700905#action_12700905
]
Edward Capriolo commented on HADOOP-4044:
-----------------------------------------
I am using/helping the hadoop-hive subproject. I wanted to share a use case for
symlinks.
For example suppose a directory inside hadoop:
/user/edward/weblogs/{web1.log,web2.log,web3.log}. I can use a Hive EXTERNAL
table to point to the parent directory. I can then use Hive to query this
external table. This is very powerful. This will work unless another file in
this directory with a different format is also in the directory
web_logsummary.csv. (this is my case)
Being able to drop in a 'symlink' where a file would go could be used to create
structures from already existing data. Imagine a user that has a large hadoop
deployment and is wishing to migrate/ start using hive. External table is
constrained to one directory. They would need to recode application paths and
or move files. If you had a 'symlink' concept anyone can start using hive
without re-organizing or copying data.
Right now, hive has a lot of facilities to deal with input formats, such as
specifying delimiters etc, but forcing the data either into a warehouse or into
an external table is limiting. 'Symlinks' tied together with hive's current
input format capabilities would make hive more versatile.
> Create symbolic links in HDFS
> -----------------------------
>
> Key: HADOOP-4044
> URL: https://issues.apache.org/jira/browse/HADOOP-4044
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch,
> symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch,
> symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch,
> symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file
> that contains a reference to another file or directory in the form of an
> absolute or relative path and that affects pathname resolution. Programs
> which read or write to files named by a symbolic link will behave as if
> operating directly on the target file. However, archiving utilities can
> handle symbolic links specially and manipulate them directly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.