No worries! I wasn't expecting any replies during the weekend.

I agree. I think option 2 is the better option. And using the existing
parameter for filename is the most natural thing. I think that could be
done.

Just to get all ideas on the table: The built-in binaryFile data source is
very useful for reading all kinds of rasters. We just need to implement a
RS_FromXXX function. If we had a similar data source for writing binary
columns to files we could easily support writing many different raster
formats.

Br,
Martin Andersson

Den mån 27 feb. 2023 kl 09:14 skrev Jia Yu <[email protected]>:

> Hi Martin,
>
> Sorry for the late reply. I was totally swarmed by my other commitments in
> the past few days.
>
> I think Option 2 makes more sense to me since we can assume we only have 1
> RasterUDT. We will allow users to specify the file name column. In the
> current GeoTiff writer, there is an option reserved for this purpose:
>
> https://github.com/apache/sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/ImageWriteOptions.scala#L29
>
> Do you think you can re-use it? We use the same GeoTiff writer. If the DF
> is RasterUDT, it will save to GeoTiff. If the DF is a DOUBLE array, it will
> also save to GeoTiff.
>
> Thanks,
> Jia
>
>
>
>
> On Fri, Feb 24, 2023 at 8:03 AM Martin Andersson <
> [email protected]> wrote:
>
> > Hi,
> >
> > The recent merge of the new raster type in
> > https://github.com/apache/sedona/pull/773 opens up the possibility of
> > adapting the geotiff data source to write rasters. To achieve this, we
> can
> > modify the data source to include two modes - classic (the current mode)
> > and raster. The mode selection can be automatic, with the data source
> > switching to raster mode if the data frame does not meet the requirements
> > of classic mode.
> >
> > In raster mode, the writer would require a raster column and an optional
> > filename column. If a filename is not provided, it could be generated
> using
> > the upper-left corner of the envelope and a uuid. For instance,
> > "ul_6490550_1338130_e89b4567-e89b-12d3-a456-426614174000.tiff". The
> > challenge is to inform the writer about the relevant columns to use.
> >
> > Option 1. Use columns with specific names like classic mode. Columns
> should
> > be named "filename" and "raster". If those are not found in the data
> frame
> > the writer will throw an exception.
> >
> > Option 2. If there is exactly one column of type raster use that,
> > regardless of it's name. Add a parameter to set the filename column.
> >
> > This would work for any data frame containing a raster column:
> > df.write.format("geotiff").save("DESTINATION_PATH")
> >
> > If you want to provide a filename:
> > df.write.format("geotiff").option("filenameColumn",
> > "my_filename_column").save("DESTINATION_PATH")
> >
> > Option 3. Other?
> >
> > What are you're thoughts?
> >
>

Reply via email to