What format are you writing the file to? Are you planning on your own custom format, or are you planning to use standard formats like parquet?
Note that Spark can write numeric data in most standard formats. If you use custom format instead, whoever consumes the data needs to parse your data. This adds complexity to your and your consumer's code. You will also need to worry about backward compatibility. I would suggest that you explore standard formats first before you write custom code. If you do have to write data in a custom format, udf is a good way to serialize the data into your format On 4/8/22, 11:14 AM, "Philipp Kraus" <philipp.kraus.flashp...@gmail.com> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hello, I have got a data frame with numerical data in Spark 3.1.1 (Java) which should be converted to a binary file. My idea is that I create a udf function that generates a byte array based on the numerical values, so I can apply this function on each row of the data frame and get than a new column with row-wise binary byte data. If this is done, I would like to write this column as continues byte stream to a file which is stored in a S3 bucket. So my question is, is the idea with the udf function a good idea and is it possible to write this continues byte stream directly to S3 / is there any built-in functionality? What is a good strategy to do this? Thanks for help --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org