You can certainly write that UDF. You get a column in a DataFrame of
array<byte> type and you can write that to any appropriate format. What do
you mean by continuous byte stream? something besides, say, parquet files
holding the byte arrays?

On Fri, Apr 8, 2022 at 10:14 AM Philipp Kraus <
philipp.kraus.flashp...@gmail.com> wrote:

> Hello,
>
> I have got a data frame with numerical data in Spark 3.1.1 (Java) which
> should be converted to a binary file.
> My idea is that I create a udf function that generates a byte array based
> on the numerical values, so I can apply this function on each row of the
> data frame and get than a new column with row-wise binary byte data.
> If this is done, I would like to write this column as continues byte
> stream to a file which is stored in a S3 bucket.
>
> So my question is, is the idea with the udf function a good idea and is it
> possible to write this continues byte stream directly to S3 / is there any
> built-in functionality?
> What is a good strategy to do this?
>
> Thanks for help
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to