No worries! I wasn't expecting any replies during the weekend. I agree. I think option 2 is the better option. And using the existing parameter for filename is the most natural thing. I think that could be done.
Just to get all ideas on the table: The built-in binaryFile data source is very useful for reading all kinds of rasters. We just need to implement a RS_FromXXX function. If we had a similar data source for writing binary columns to files we could easily support writing many different raster formats. Br, Martin Andersson Den mån 27 feb. 2023 kl 09:14 skrev Jia Yu <[email protected]>: > Hi Martin, > > Sorry for the late reply. I was totally swarmed by my other commitments in > the past few days. > > I think Option 2 makes more sense to me since we can assume we only have 1 > RasterUDT. We will allow users to specify the file name column. In the > current GeoTiff writer, there is an option reserved for this purpose: > > https://github.com/apache/sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/ImageWriteOptions.scala#L29 > > Do you think you can re-use it? We use the same GeoTiff writer. If the DF > is RasterUDT, it will save to GeoTiff. If the DF is a DOUBLE array, it will > also save to GeoTiff. > > Thanks, > Jia > > > > > On Fri, Feb 24, 2023 at 8:03 AM Martin Andersson < > [email protected]> wrote: > > > Hi, > > > > The recent merge of the new raster type in > > https://github.com/apache/sedona/pull/773 opens up the possibility of > > adapting the geotiff data source to write rasters. To achieve this, we > can > > modify the data source to include two modes - classic (the current mode) > > and raster. The mode selection can be automatic, with the data source > > switching to raster mode if the data frame does not meet the requirements > > of classic mode. > > > > In raster mode, the writer would require a raster column and an optional > > filename column. If a filename is not provided, it could be generated > using > > the upper-left corner of the envelope and a uuid. For instance, > > "ul_6490550_1338130_e89b4567-e89b-12d3-a456-426614174000.tiff". The > > challenge is to inform the writer about the relevant columns to use. > > > > Option 1. Use columns with specific names like classic mode. Columns > should > > be named "filename" and "raster". If those are not found in the data > frame > > the writer will throw an exception. > > > > Option 2. If there is exactly one column of type raster use that, > > regardless of it's name. Add a parameter to set the filename column. > > > > This would work for any data frame containing a raster column: > > df.write.format("geotiff").save("DESTINATION_PATH") > > > > If you want to provide a filename: > > df.write.format("geotiff").option("filenameColumn", > > "my_filename_column").save("DESTINATION_PATH") > > > > Option 3. Other? > > > > What are you're thoughts? > > >
