Hey Bing,
There’s a couple different approaches you could take. The quickest and easiest
would be to use the existing APIs
val bytes = spark.range(1000
bytes.foreachPartition(bytes =>{
//W ARNING anything used in here will need to be serializable.
// There's some magic to serializing the hadoop conf. see the hadoop wrapper
class in the source
val writer = FileSystem.get(null).create(new Path("s3://..."))
bytes.foreach(b => writer.write(b))
writer.close()
})
The more complicated but pretty approach would be to either implement a custom
datasource.
From: "Duan,Bing" <[email protected]>
Date: Thursday, January 16, 2020 at 12:35 AM
To: "[email protected]" <[email protected]>
Subject: How to implement a "saveAsBinaryFile" function?
Hi all:
I read binary data(protobuf format) from filesystem by binaryFiles function to
a RDD[Array[Byte]] it works fine. But when I save the it to filesystem by
saveAsTextFile, the quotation mark was be escaped like this:
"\"201900002_1\"",1,24,0,2,"\"S66.000x001\””, which should be
"201900002_1",1,24,0,2,”S66.000x001”.
Anyone could give me some tip to implement a function like saveAsBinaryFile to
persist the RDD[Array[Byte]]?
Bests!
Bing