[
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258567#comment-15258567
]
Chris Nauroth commented on HDFS-10327:
--------------------------------------
It looks like in that example, myfile.csv is a directory, and its contents are
3 files: _SUCCESS, part-00000 and part-00001. Attempting to open myfile.csv
directly as a file definitely won't work. If Spark has a feature that lets you
"open" it directly, then perhaps this is implemented at the application layer
by Spark? Maybe it does something equivalent to {{hdfs dfs -cat
myfile.csv/part*}}?
That last example demonstrates the separation of concerns I'm talking about:
the Hadoop shell command performs glob expansion to identify all files matching
a pattern, and then it opens and displays each file separately, using HDFS APIs
that operate on individual file paths.
> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> --------------------------------------------------------------------
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: webhdfs
> Reporter: Thomas Hille
> Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many
> parts of the file. When you read it with spark programmatically, you can read
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
> is not a file: [...]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)