[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258734#comment-15258734
 ] 

Mitchell Gudmundson commented on HDFS-10327:
--------------------------------------------

Greetings,

Unless I'm mistaken this is not a Spark specific issue. Even when running 
simple mapreduce jobs you end up with a directory of part files part-r where r 
is the reducer number. These directories are generally meant to be interpreted 
as one logical "file". In the Spark world when writing out an RDD or Dataframe 
you get a part file per partition (just the same as you would per reducer on 
the MR framework), however the concept is no different than on other 
distributed processing engines. It seems that one would want to be able to 
retrieve back the file contents of the various parts as a whole.

Regards,
-Mitch

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> --------------------------------------------------------------------
>
>                 Key: HDFS-10327
>                 URL: https://issues.apache.org/jira/browse/HDFS-10327
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Thomas Hille
>              Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to