Re: [DISCUSSION] Geotiff data source for writing rasters

Jia Yu Mon, 27 Feb 2023 10:08:12 -0800

Hi Martin,

Thanks!


Got it. So you want something like "df.write.format("binary")....", right?

I am OK with it if this is more extensible in the future.

Thanks,
Jia

On Mon, Feb 27, 2023 at 8:07 AM Martin Andersson <
[email protected]> wrote:

> No worries! I wasn't expecting any replies during the weekend.
>
> I agree. I think option 2 is the better option. And using the existing
> parameter for filename is the most natural thing. I think that could be
> done.
>
> Just to get all ideas on the table: The built-in binaryFile data source is
> very useful for reading all kinds of rasters. We just need to implement a
> RS_FromXXX function. If we had a similar data source for writing binary
> columns to files we could easily support writing many different raster
> formats.
>
> Br,
> Martin Andersson
>
> Den mån 27 feb. 2023 kl 09:14 skrev Jia Yu <[email protected]>:
>
> > Hi Martin,
> >
> > Sorry for the late reply. I was totally swarmed by my other commitments
> in
> > the past few days.
> >
> > I think Option 2 makes more sense to me since we can assume we only have
> 1
> > RasterUDT. We will allow users to specify the file name column. In the
> > current GeoTiff writer, there is an option reserved for this purpose:
> >
> >
> https://github.com/apache/sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/ImageWriteOptions.scala#L29
> >
> > Do you think you can re-use it? We use the same GeoTiff writer. If the DF
> > is RasterUDT, it will save to GeoTiff. If the DF is a DOUBLE array, it
> will
> > also save to GeoTiff.
> >
> > Thanks,
> > Jia
> >
> >
> >
> >
> > On Fri, Feb 24, 2023 at 8:03 AM Martin Andersson <
> > [email protected]> wrote:
> >
> > > Hi,
> > >
> > > The recent merge of the new raster type in
> > > https://github.com/apache/sedona/pull/773 opens up the possibility of
> > > adapting the geotiff data source to write rasters. To achieve this, we
> > can
> > > modify the data source to include two modes - classic (the current
> mode)
> > > and raster. The mode selection can be automatic, with the data source
> > > switching to raster mode if the data frame does not meet the
> requirements
> > > of classic mode.
> > >
> > > In raster mode, the writer would require a raster column and an
> optional
> > > filename column. If a filename is not provided, it could be generated
> > using
> > > the upper-left corner of the envelope and a uuid. For instance,
> > > "ul_6490550_1338130_e89b4567-e89b-12d3-a456-426614174000.tiff". The
> > > challenge is to inform the writer about the relevant columns to use.
> > >
> > > Option 1. Use columns with specific names like classic mode. Columns
> > should
> > > be named "filename" and "raster". If those are not found in the data
> > frame
> > > the writer will throw an exception.
> > >
> > > Option 2. If there is exactly one column of type raster use that,
> > > regardless of it's name. Add a parameter to set the filename column.
> > >
> > > This would work for any data frame containing a raster column:
> > > df.write.format("geotiff").save("DESTINATION_PATH")
> > >
> > > If you want to provide a filename:
> > > df.write.format("geotiff").option("filenameColumn",
> > > "my_filename_column").save("DESTINATION_PATH")
> > >
> > > Option 3. Other?
> > >
> > > What are you're thoughts?
> > >
> >
>

Re: [DISCUSSION] Geotiff data source for writing rasters

Reply via email to