Fwd: Spark Write BinaryType Column as continues file to S3

Philipp Kraus Fri, 08 Apr 2022 09:18:39 -0700

Hello,

This sound great for the first step.


> Am 08.04.2022 um 17:25 schrieb Sean Owen <sro...@gmail.com 
> <mailto:sro...@gmail.com>>:
> 
> You can certainly write that UDF. You get a column in a DataFrame of 
> array<byte> type and you can write that to any appropriate format.

> What do you mean by continuous byte stream? something besides, say, parquet 
> files holding the byte arrays?

I would like to write not a table structure, I would like to write each row of 
my data frame as the section „Point data records“ in the LAS format 
https://en.wikipedia.org/wiki/LAS_file_format 
<https://en.wikipedia.org/wiki/LAS_file_format>

Thanks a lot

> 
> On Fri, Apr 8, 2022 at 10:14 AM Philipp Kraus 
> <philipp.kraus.flashp...@gmail.com 
> <mailto:philipp.kraus.flashp...@gmail.com>> wrote:
> Hello,
> 
> I have got a data frame with numerical data in Spark 3.1.1 (Java) which 
> should be converted to a binary file. 
> My idea is that I create a udf function that generates a byte array based on 
> the numerical values, so I can apply this function on each row of the data 
> frame and get than a new column with row-wise binary byte data.
> If this is done, I would like to write this column as continues byte stream 
> to a file which is stored in a S3 bucket.
> 
> So my question is, is the idea with the udf function a good idea and is it 
> possible to write this continues byte stream directly to S3 / is there any 
> built-in functionality?
> What is a good strategy to do this?
> 
> Thanks for help
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
>> 
>

Fwd: Spark Write BinaryType Column as continues file to S3

Reply via email to