Thanks for trying it out Pedro!

Unfortunately I found a bug in ST_Values. But there is also a workaround.
I'm working on a fix.

https://issues.apache.org/jira/browse/SEDONA-266

Br,
Martin Andersson

Den tis 21 mars 2023 kl 18:39 skrev Jia Yu <[email protected]>:

> Hi Pedro,
>
> You should use sedona.apache.org instead of sedona.staged.apache.org.
> `staged` website is for us to test the website template. We haven't
> been updating that website for more than 1 year.
>
> Here is the doc for Martin's RasterUDT:
> https://sedona.apache.org/1.4.0/api/sql/Raster-loader/
>
> Thanks,
> Jia
>
> On Tue, Mar 21, 2023 at 8:30 AM Pedro Mano Fernandes
> <[email protected]> wrote:
> >
> > Hi Martin,
> >
> > It's weird I don't see your new Raster features in the docs in
> > https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
> > documentation was already up-to-date after the release of sedona-1.4.0.
> >
> > Best regards,
> >
> > On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <[email protected]
> >
> > wrote:
> >
> > > Hi Martin,
> > >
> > > Great news! I'll give it a go and will let you know.
> > >
> > > Thanks for letting me know.
> > > Best regards,
> > >
> > > On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
> > > [email protected]> wrote:
> > >
> > >> Hi again Pedro,
> > >>
> > >> Since https://github.com/apache/sedona/pull/773 got merged you should
> > >> now be able to use Apache Sedona for your GeoTiff processing needs.
> It will
> > >> be included in the next Sedona release.
> > >>
> > >> All feedback is welcome!
> > >>
> > >> Br
> > >> Martin Andersson
> > >>
> > >>
> > >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
> > >> [email protected]>:
> > >>
> > >>> Hi Martin,
> > >>>
> > >>> I've tested your proposal (reading binary and UDF getValue) and it
> works
> > >>> fine. I've actually converted the code to Scala easily. Now it's a
> matter
> > >>> of building/optimizing around it (spatial join, aggregate points per
> > >>> geotiff).
> > >>>
> > >>> Best,
> > >>>
> > >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
> > >>> [email protected]> wrote:
> > >>>
> > >>>> Yes, there are lots of things to consider when processing large
> blobs
> > >>>> in Spark. What I have come to learn:
> > >>>>  - Do the spatial join (points and the geotiff extent) with as few
> > >>>> columns as possible. Ideally an id only for the geotiff. After that
> join
> > >>>> you can join back the geotiff using the id.
> > >>>>  - Aggregate the points to an array of points per geotiff. Your
> > >>>> getValue udf should take an array of points and return an array of
> values.
> > >>>> That way each geotiff is only loaded once.
> > >>>>  - Parquet in Spark is not very good at handling large blobs. If
> > >>>> reading parquet with geotiffs is slow you can repartition() with a
> very
> > >>>> large number to force smaller row groups when writing or use Avro
> instead.
> > >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
> > >>>>
> > >>>> Good luck!
> > >>>>
> > >>>> Br,
> > >>>> Martin Andersson
> > >>>>
> > >>>>
> > >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
> > >>>> [email protected]>:
> > >>>>
> > >>>>> Thanks Martin, it sounds promising. I'll actually give it a try
> before
> > >>>>> going with geotiff conversions.
> > >>>>>
> > >>>>> I'm foreseeing some concerns, though:
> > >>>>>
> > >>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have
> to
> > >>>>>    split the geotiff into smaller geotiffs
> > >>>>>    - I wonder how the spatial partitioning optimization will behave
> > >>>>>    in such approach - I may have to load smaller geotiffs and use
> their
> > >>>>>    geometry to join (my coordinates against envelope boundaries)
> before
> > >>>>>    calculating the getValue for my coordinates
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
> > >>>>> [email protected]> wrote:
> > >>>>>
> > >>>>>> I would read the geotiff files as binary:
> > >>>>>>
> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
> > >>>>>>
> > >>>>>> Then you can define a udf to extract values directly from the
> > >>>>>> geotiffs. If you're on python you can use raster.io to do that.
> > >>>>>>
> > >>>>>> In java it would look some thing like this:
> > >>>>>>
> > >>>>>>   Integer getValue(byte[] geotiff, double x, double y)
> > >>>>>>       throws IOException, TransformException {
> > >>>>>>     try (ByteArrayInputStream inputStream = new
> > >>>>>> ByteArrayInputStream(geotiff)) {
> > >>>>>>       GeoTiffReader geoTiffReader = new
> GeoTiffReader(inputStream);
> > >>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
> > >>>>>>       Raster raster = grid.getRenderedImage().getData();
> > >>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
> > >>>>>>
> > >>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x,
> y);
> > >>>>>>       GridCoordinates2D gridCoordinates2D =
> > >>>>>> gridGeometry.worldToGrid(directPosition2D);
> > >>>>>>       try {
> > >>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
> > >>>>>> gridCoordinates2D.y, new int[1]);
> > >>>>>>           return pixel[0];
> > >>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
> > >>>>>>           // point is outside the extentent
> > >>>>>>           result.add(null);
> > >>>>>>       }
> > >>>>>>     }
> > >>>>>> }
> > >>>>>>
> > >>>>>> Br,
> > >>>>>> Martin Andersson
> > >>>>>>
> > >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
> > >>>>>> [email protected]>:
> > >>>>>>
> > >>>>>>> Thanks for the update, guys.
> > >>>>>>>
> > >>>>>>> I'm not ready to contribute yet.
> > >>>>>>>
> > >>>>>>> In the meanwhile, the solution could be perhaps to convert
> GeoTiff
> > >>>>>>> to another format supported by Sedona. If anyone has had this
> use case
> > >>>>>>> before or has any idea, please share.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>>
> > >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
> > >>>>>>> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> I think you are looking for something like this:
> > >>>>>>>> https://postgis.net/docs/RT_ST_Value.html
> > >>>>>>>>
> > >>>>>>>> The raster support in Sedona is very limited at the moment. The
> > >>>>>>>> lack of a proper raster type makes implementing st_value
> impossible. We had
> > >>>>>>>> a brief discussion about that recently.
> > >>>>>>>>
> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
> > >>>>>>>>
> > >>>>>>>> If you want to make a contribution and need some guidance,
> please
> > >>>>>>>> let me know!
> > >>>>>>>>
> > >>>>>>>> Br,
> > >>>>>>>> Martin Andersson
> > >>>>>>>>
> > >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>:
> > >>>>>>>>
> > >>>>>>>>> Hi Pedro,
> > >>>>>>>>>
> > >>>>>>>>> I got your point. Unfortunately, we don't have this function
> yet
> > >>>>>>>>> in Sedona.
> > >>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Jia
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
> > >>>>>>>>> [email protected]>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> > Hi all,
> > >>>>>>>>> >
> > >>>>>>>>> > Any clue? Or any documentation I can refer to?
> > >>>>>>>>> >
> > >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I
> > >>>>>>>>> can click a
> > >>>>>>>>> > point (coordinates) of the geotiff and get the value in that
> > >>>>>>>>> point (in this
> > >>>>>>>>> > case 231 of Band 1).
> > >>>>>>>>> >
> > >>>>>>>>> > [image: image.png]
> > >>>>>>>>> >
> > >>>>>>>>> > Thanks,
> > >>>>>>>>> >
> > >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
> > >>>>>>>>> [email protected]>
> > >>>>>>>>> > wrote:
> > >>>>>>>>> >
> > >>>>>>>>> >> Hi Jia,
> > >>>>>>>>> >>
> > >>>>>>>>> >> Thanks for the fast response.
> > >>>>>>>>> >>
> > >>>>>>>>> >> With the regular spatial join I’ll get the array of data of
> the
> > >>>>>>>>> whole
> > >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
> > >>>>>>>>> specific
> > >>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
> > >>>>>>>>> array of data
> > >>>>>>>>> >> corresponds to all the positions in the polygon, but I want
> to
> > >>>>>>>>> fetch
> > >>>>>>>>> >> specific positions.
> > >>>>>>>>> >>
> > >>>>>>>>> >> Thanks,
> > >>>>>>>>> >>
> > >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]>
> wrote:
> > >>>>>>>>> >>
> > >>>>>>>>> >>> Hi Pedro,
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs,
> you
> > >>>>>>>>> will get
> > >>>>>>>>> >>> a dataframe with the following schema:
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>  |-- image: struct (nullable = true)
> > >>>>>>>>> >>>  |    |-- origin: string (nullable = true)
> > >>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
> > >>>>>>>>> >>>  |    |-- height: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- width: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- data: array (nullable = true)
> > >>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> You can use the following way to fetch the geometry column
> and
> > >>>>>>>>> perform
> > >>>>>>>>> >>> the spatial join;
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
> > >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
> > >>>>>>>>> "image.height as height",
> > >>>>>>>>> >>> "image.width as width", "image.data as data",
> "image.nBands as
> > >>>>>>>>> bands")
> > >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
> > >>>>>>>>> >>> geotiffDF.show()
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> More info can be found:
> > >>>>>>>>> >>>
> > >>>>>>>>>
> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> Thanks,
> > >>>>>>>>> >>> Jia
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
> > >>>>>>>>> >>> [email protected]> wrote:
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>> Hi everyone!
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
> > >>>>>>>>> understand how to
> > >>>>>>>>> >>>> load the dataset, as described in
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>>
> > >>>>>>>>>
> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
> > >>>>>>>>> >>>> .
> > >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another
> one
> > >>>>>>>>> that
> > >>>>>>>>> >>>> contains
> > >>>>>>>>> >>>> coordinates, in order to get the elevation data for those
> > >>>>>>>>> coordinates.
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Something along these lines:
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> pointsDF
> > >>>>>>>>> >>>>   .join(geotiffDF, ...)
> > >>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Are there any examples or documentation I can follow to
> > >>>>>>>>> accomplish this?
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Thanks,
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> --
> > >>>>>>>>> >>>> Pedro Mano Fernandes
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>> --
> > >>>>>>>>> >> Pedro Mano Fernandes
> > >>>>>>>>> >>
> > >>>>>>>>> >
> > >>>>>>>>> >
> > >>>>>>>>> > --
> > >>>>>>>>> > Pedro Mano Fernandes
> > >>>>>>>>> >
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Hälsningar,
> > >>>>>>>> Martin
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Pedro Mano Fernandes
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Pedro Mano Fernandes
> > >>>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Pedro Mano Fernandes
> > >>>
> > >>
> > >
> > > --
> > > Pedro Mano Fernandes
> > >
> >
> >
> > --
> > Pedro Mano Fernandes
>

Reply via email to