[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark

Thomas Hille (JIRA) Mon, 25 Apr 2016 16:48:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257266#comment-15257266
 ]


Thomas Hille commented on HDFS-10327:
-------------------------------------

Hej Chris Nauroth,
they dont use any special serialization. If you save a text file in hdfs like 
dataframe.write().format("com.databricks.spark.csv").option("header", 
"true").mode(SaveMode.Overwrite).save("/tmp/file.csv"); you end up with a bunch 
of part-00000 part-00001 part-00002 ... files inside of the folder 
/tmp/file.csv/. They are plain text and basically just the splitted csv-file. 
Anyway, I understand now, thats not of your business.

Thank you for taking your time to respond!

> Open files in WEBHDFS which are stored in folders by Spark
> ----------------------------------------------------------
>
>                 Key: HDFS-10327
>                 URL: https://issues.apache.org/jira/browse/HDFS-10327
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Thomas Hille
>              Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark

Reply via email to