Thanks for trying it out Pedro! Unfortunately I found a bug in ST_Values. But there is also a workaround. I'm working on a fix.
https://issues.apache.org/jira/browse/SEDONA-266 Br, Martin Andersson Den tis 21 mars 2023 kl 18:39 skrev Jia Yu <[email protected]>: > Hi Pedro, > > You should use sedona.apache.org instead of sedona.staged.apache.org. > `staged` website is for us to test the website template. We haven't > been updating that website for more than 1 year. > > Here is the doc for Martin's RasterUDT: > https://sedona.apache.org/1.4.0/api/sql/Raster-loader/ > > Thanks, > Jia > > On Tue, Mar 21, 2023 at 8:30 AM Pedro Mano Fernandes > <[email protected]> wrote: > > > > Hi Martin, > > > > It's weird I don't see your new Raster features in the docs in > > https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the > > documentation was already up-to-date after the release of sedona-1.4.0. > > > > Best regards, > > > > On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <[email protected] > > > > wrote: > > > > > Hi Martin, > > > > > > Great news! I'll give it a go and will let you know. > > > > > > Thanks for letting me know. > > > Best regards, > > > > > > On Tue, 28 Feb 2023 at 14:53, Martin Andersson < > > > [email protected]> wrote: > > > > > >> Hi again Pedro, > > >> > > >> Since https://github.com/apache/sedona/pull/773 got merged you should > > >> now be able to use Apache Sedona for your GeoTiff processing needs. > It will > > >> be included in the next Sedona release. > > >> > > >> All feedback is welcome! > > >> > > >> Br > > >> Martin Andersson > > >> > > >> > > >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes < > > >> [email protected]>: > > >> > > >>> Hi Martin, > > >>> > > >>> I've tested your proposal (reading binary and UDF getValue) and it > works > > >>> fine. I've actually converted the code to Scala easily. Now it's a > matter > > >>> of building/optimizing around it (spatial join, aggregate points per > > >>> geotiff). > > >>> > > >>> Best, > > >>> > > >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson < > > >>> [email protected]> wrote: > > >>> > > >>>> Yes, there are lots of things to consider when processing large > blobs > > >>>> in Spark. What I have come to learn: > > >>>> - Do the spatial join (points and the geotiff extent) with as few > > >>>> columns as possible. Ideally an id only for the geotiff. After that > join > > >>>> you can join back the geotiff using the id. > > >>>> - Aggregate the points to an array of points per geotiff. Your > > >>>> getValue udf should take an array of points and return an array of > values. > > >>>> That way each geotiff is only loaded once. > > >>>> - Parquet in Spark is not very good at handling large blobs. If > > >>>> reading parquet with geotiffs is slow you can repartition() with a > very > > >>>> large number to force smaller row groups when writing or use Avro > instead. > > >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/ > > >>>> > > >>>> Good luck! > > >>>> > > >>>> Br, > > >>>> Martin Andersson > > >>>> > > >>>> > > >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes < > > >>>> [email protected]>: > > >>>> > > >>>>> Thanks Martin, it sounds promising. I'll actually give it a try > before > > >>>>> going with geotiff conversions. > > >>>>> > > >>>>> I'm foreseeing some concerns, though: > > >>>>> > > >>>>> - I'm afraid it won't be optimal for a big geotiff - I may have > to > > >>>>> split the geotiff into smaller geotiffs > > >>>>> - I wonder how the spatial partitioning optimization will behave > > >>>>> in such approach - I may have to load smaller geotiffs and use > their > > >>>>> geometry to join (my coordinates against envelope boundaries) > before > > >>>>> calculating the getValue for my coordinates > > >>>>> > > >>>>> Best, > > >>>>> > > >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson < > > >>>>> [email protected]> wrote: > > >>>>> > > >>>>>> I would read the geotiff files as binary: > > >>>>>> > https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html > > >>>>>> > > >>>>>> Then you can define a udf to extract values directly from the > > >>>>>> geotiffs. If you're on python you can use raster.io to do that. > > >>>>>> > > >>>>>> In java it would look some thing like this: > > >>>>>> > > >>>>>> Integer getValue(byte[] geotiff, double x, double y) > > >>>>>> throws IOException, TransformException { > > >>>>>> try (ByteArrayInputStream inputStream = new > > >>>>>> ByteArrayInputStream(geotiff)) { > > >>>>>> GeoTiffReader geoTiffReader = new > GeoTiffReader(inputStream); > > >>>>>> GridCoverage2D grid = geoTiffReader.read(null); > > >>>>>> Raster raster = grid.getRenderedImage().getData(); > > >>>>>> GridGeometry2D gridGeometry = grid.getGridGeometry(); > > >>>>>> > > >>>>>> DirectPosition2D directPosition2D = new DirectPosition2D(x, > y); > > >>>>>> GridCoordinates2D gridCoordinates2D = > > >>>>>> gridGeometry.worldToGrid(directPosition2D); > > >>>>>> try { > > >>>>>> int[] pixel = raster.getPixel(gridCoordinates2D.x, > > >>>>>> gridCoordinates2D.y, new int[1]); > > >>>>>> return pixel[0]; > > >>>>>> } catch (ArrayIndexOutOfBoundsException exc) { > > >>>>>> // point is outside the extentent > > >>>>>> result.add(null); > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> > > >>>>>> Br, > > >>>>>> Martin Andersson > > >>>>>> > > >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes < > > >>>>>> [email protected]>: > > >>>>>> > > >>>>>>> Thanks for the update, guys. > > >>>>>>> > > >>>>>>> I'm not ready to contribute yet. > > >>>>>>> > > >>>>>>> In the meanwhile, the solution could be perhaps to convert > GeoTiff > > >>>>>>> to another format supported by Sedona. If anyone has had this > use case > > >>>>>>> before or has any idea, please share. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> > > >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson < > > >>>>>>> [email protected]> wrote: > > >>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> I think you are looking for something like this: > > >>>>>>>> https://postgis.net/docs/RT_ST_Value.html > > >>>>>>>> > > >>>>>>>> The raster support in Sedona is very limited at the moment. The > > >>>>>>>> lack of a proper raster type makes implementing st_value > impossible. We had > > >>>>>>>> a brief discussion about that recently. > > >>>>>>>> > https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c > > >>>>>>>> > > >>>>>>>> If you want to make a contribution and need some guidance, > please > > >>>>>>>> let me know! > > >>>>>>>> > > >>>>>>>> Br, > > >>>>>>>> Martin Andersson > > >>>>>>>> > > >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>: > > >>>>>>>> > > >>>>>>>>> Hi Pedro, > > >>>>>>>>> > > >>>>>>>>> I got your point. Unfortunately, we don't have this function > yet > > >>>>>>>>> in Sedona. > > >>>>>>>>> But we welcome anyone who want to contribute this to Sedona! > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Jia > > >>>>>>>>> > > >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes < > > >>>>>>>>> [email protected]> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> > Hi all, > > >>>>>>>>> > > > >>>>>>>>> > Any clue? Or any documentation I can refer to? > > >>>>>>>>> > > > >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I > > >>>>>>>>> can click a > > >>>>>>>>> > point (coordinates) of the geotiff and get the value in that > > >>>>>>>>> point (in this > > >>>>>>>>> > case 231 of Band 1). > > >>>>>>>>> > > > >>>>>>>>> > [image: image.png] > > >>>>>>>>> > > > >>>>>>>>> > Thanks, > > >>>>>>>>> > > > >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes < > > >>>>>>>>> [email protected]> > > >>>>>>>>> > wrote: > > >>>>>>>>> > > > >>>>>>>>> >> Hi Jia, > > >>>>>>>>> >> > > >>>>>>>>> >> Thanks for the fast response. > > >>>>>>>>> >> > > >>>>>>>>> >> With the regular spatial join I’ll get the array of data of > the > > >>>>>>>>> whole > > >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for > > >>>>>>>>> specific > > >>>>>>>>> >> coordinates inside that polygon. In other words: I guess the > > >>>>>>>>> array of data > > >>>>>>>>> >> corresponds to all the positions in the polygon, but I want > to > > >>>>>>>>> fetch > > >>>>>>>>> >> specific positions. > > >>>>>>>>> >> > > >>>>>>>>> >> Thanks, > > >>>>>>>>> >> > > >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> > wrote: > > >>>>>>>>> >> > > >>>>>>>>> >>> Hi Pedro, > > >>>>>>>>> >>> > > >>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, > you > > >>>>>>>>> will get > > >>>>>>>>> >>> a dataframe with the following schema: > > >>>>>>>>> >>> > > >>>>>>>>> >>> |-- image: struct (nullable = true) > > >>>>>>>>> >>> | |-- origin: string (nullable = true) > > >>>>>>>>> >>> | |-- Geometry: string (nullable = true) > > >>>>>>>>> >>> | |-- height: integer (nullable = true) > > >>>>>>>>> >>> | |-- width: integer (nullable = true) > > >>>>>>>>> >>> | |-- nBands: integer (nullable = true) > > >>>>>>>>> >>> | |-- data: array (nullable = true) > > >>>>>>>>> >>> | | |-- element: double (containsNull = true) > > >>>>>>>>> >>> > > >>>>>>>>> >>> > > >>>>>>>>> >>> You can use the following way to fetch the geometry column > and > > >>>>>>>>> perform > > >>>>>>>>> >>> the spatial join; > > >>>>>>>>> >>> > > >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as > > >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", > > >>>>>>>>> "image.height as height", > > >>>>>>>>> >>> "image.width as width", "image.data as data", > "image.nBands as > > >>>>>>>>> bands") > > >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe") > > >>>>>>>>> >>> geotiffDF.show() > > >>>>>>>>> >>> > > >>>>>>>>> >>> More info can be found: > > >>>>>>>>> >>> > > >>>>>>>>> > https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader > > >>>>>>>>> >>> > > >>>>>>>>> >>> Thanks, > > >>>>>>>>> >>> Jia > > >>>>>>>>> >>> > > >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes < > > >>>>>>>>> >>> [email protected]> wrote: > > >>>>>>>>> >>> > > >>>>>>>>> >>>> Hi everyone! > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I > > >>>>>>>>> understand how to > > >>>>>>>>> >>>> load the dataset, as described in > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> > > >>>>>>>>> > https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader > > >>>>>>>>> >>>> . > > >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another > one > > >>>>>>>>> that > > >>>>>>>>> >>>> contains > > >>>>>>>>> >>>> coordinates, in order to get the elevation data for those > > >>>>>>>>> coordinates. > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> Something along these lines: > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> pointsDF > > >>>>>>>>> >>>> .join(geotiffDF, ...) > > >>>>>>>>> >>>> .select("lon", "lat", "geotiff_data") > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> Are there any examples or documentation I can follow to > > >>>>>>>>> accomplish this? > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> Thanks, > > >>>>>>>>> >>>> > > >>>>>>>>> >>>> -- > > >>>>>>>>> >>>> Pedro Mano Fernandes > > >>>>>>>>> >>>> > > >>>>>>>>> >>> -- > > >>>>>>>>> >> Pedro Mano Fernandes > > >>>>>>>>> >> > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > -- > > >>>>>>>>> > Pedro Mano Fernandes > > >>>>>>>>> > > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Hälsningar, > > >>>>>>>> Martin > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Pedro Mano Fernandes > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> -- > > >>>>> Pedro Mano Fernandes > > >>>>> > > >>>> > > >>> > > >>> -- > > >>> Pedro Mano Fernandes > > >>> > > >> > > > > > > -- > > > Pedro Mano Fernandes > > > > > > > > > -- > > Pedro Mano Fernandes >
