[
https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958684#comment-15958684
]
ASF GitHub Bot commented on NUTCH-2281:
---------------------------------------
sebastian-nagel closed pull request #119: NUTCH-2281 Support non-default
FileSystem
URL: https://github.com/apache/nutch/pull/119
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support non-default FileSystem
> ------------------------------
>
> Key: NUTCH-2281
> URL: https://issues.apache.org/jira/browse/NUTCH-2281
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.12
> Reporter: Sebastian Nagel
> Fix For: 1.14
>
>
> If a path (input or output) does not belong to the configured default
> FileSystem various Nutch tools may raise an exception like
> {noformat}
> Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://...,
> expected: hdfs://...
> {noformat}
> This is fixed by getting a reference to the FileSystem from the Path object
> {noformat}
> FileSystem fs = path.getFileSystem(getConf());
> {noformat}
> instead of
> {noformat}
> FileSystem fs = FileSystem.get(getConf());
> {noformat}
> A given path (e.g., {{s3a://...}}) may not belong to the default file system
> ({{hdfs://}} or {{file://}} in local mode) and simple checks such as
> {{fs.exists(path)}} then will fail. Cf.
> [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)],
> and
> [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)]
> vs.
> [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)]
> which is called by
> [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29].
>
> Note that the FileSystem for input and output may be different, e.g., read
> from HDFS and write to S3.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)