[ 
https://issues.apache.org/jira/browse/SPARK-20528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716956#comment-16716956
 ] 

Hyukjin Kwon commented on SPARK-20528:
--------------------------------------

Yes but you can read it via few API call chains. Can you ask it to mailing list 
instead of here? What youre doing is completely orthogonal with this issue, and 
questions should go there.

> Add BinaryFileReader and Writer for DataFrames
> ----------------------------------------------
>
>                 Key: SPARK-20528
>                 URL: https://issues.apache.org/jira/browse/SPARK-20528
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>         Attachments: 
> part-00000-5ae00646-8400-4b45-aa6f-d6f27068972c-c000.json, stocklist.json, 
> stocklist.pdub
>
>
> It would be very useful to have a binary data reader/writer for DataFrames, 
> presumably called via {{spark.read.binaryFiles}}, etc.
> Currently, going through RDDs is annoying since it requires different code 
> paths for Scala vs Python:
> Scala:
> {code}
> val binaryFilesRDD = sc.binaryFiles("mypath")
> val binaryFilesDF = spark.createDataFrame(binaryFilesRDD)
> {code}
> Python:
> {code}
> binaryFilesRDD = sc.binaryFiles("mypath")
> binaryFilesRDD_recast = binaryFilesRDD.map(lambda x: (x[0], bytearray(x[1])))
> binaryFilesDF = spark.createDataFrame(binaryFilesRDD_recast)
> {code}
> This is because Scala and Python {{sc.binaryFiles}} return different types, 
> which makes sense in RDD land but not DataFrame land.
> My motivation here is working with images in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to