Hello, This sound great for the first step.
> Am 08.04.2022 um 17:25 schrieb Sean Owen <sro...@gmail.com > <mailto:sro...@gmail.com>>: > > You can certainly write that UDF. You get a column in a DataFrame of > array<byte> type and you can write that to any appropriate format. > What do you mean by continuous byte stream? something besides, say, parquet > files holding the byte arrays? I would like to write not a table structure, I would like to write each row of my data frame as the section „Point data records“ in the LAS format https://en.wikipedia.org/wiki/LAS_file_format <https://en.wikipedia.org/wiki/LAS_file_format> Thanks a lot > > On Fri, Apr 8, 2022 at 10:14 AM Philipp Kraus > <philipp.kraus.flashp...@gmail.com > <mailto:philipp.kraus.flashp...@gmail.com>> wrote: > Hello, > > I have got a data frame with numerical data in Spark 3.1.1 (Java) which > should be converted to a binary file. > My idea is that I create a udf function that generates a byte array based on > the numerical values, so I can apply this function on each row of the data > frame and get than a new column with row-wise binary byte data. > If this is done, I would like to write this column as continues byte stream > to a file which is stored in a S3 bucket. > > So my question is, is the idea with the udf function a good idea and is it > possible to write this continues byte stream directly to S3 / is there any > built-in functionality? > What is a good strategy to do this? > > Thanks for help > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> >> >