Hi Martin, Great news! I'll give it a go and will let you know.
Thanks for letting me know. Best regards, On Tue, 28 Feb 2023 at 14:53, Martin Andersson <[email protected]> wrote: > Hi again Pedro, > > Since https://github.com/apache/sedona/pull/773 got merged you should now > be able to use Apache Sedona for your GeoTiff processing needs. It will be > included in the next Sedona release. > > All feedback is welcome! > > Br > Martin Andersson > > > Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes < > [email protected]>: > >> Hi Martin, >> >> I've tested your proposal (reading binary and UDF getValue) and it works >> fine. I've actually converted the code to Scala easily. Now it's a matter >> of building/optimizing around it (spatial join, aggregate points per >> geotiff). >> >> Best, >> >> On Fri, 20 Jan 2023 at 13:47, Martin Andersson < >> [email protected]> wrote: >> >>> Yes, there are lots of things to consider when processing large blobs in >>> Spark. What I have come to learn: >>> - Do the spatial join (points and the geotiff extent) with as few >>> columns as possible. Ideally an id only for the geotiff. After that join >>> you can join back the geotiff using the id. >>> - Aggregate the points to an array of points per geotiff. Your getValue >>> udf should take an array of points and return an array of values. That way >>> each geotiff is only loaded once. >>> - Parquet in Spark is not very good at handling large blobs. If reading >>> parquet with geotiffs is slow you can repartition() with a very large >>> number to force smaller row groups when writing or use Avro instead. >>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/ >>> >>> Good luck! >>> >>> Br, >>> Martin Andersson >>> >>> >>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes < >>> [email protected]>: >>> >>>> Thanks Martin, it sounds promising. I'll actually give it a try before >>>> going with geotiff conversions. >>>> >>>> I'm foreseeing some concerns, though: >>>> >>>> - I'm afraid it won't be optimal for a big geotiff - I may have to >>>> split the geotiff into smaller geotiffs >>>> - I wonder how the spatial partitioning optimization will behave in >>>> such approach - I may have to load smaller geotiffs and use their >>>> geometry >>>> to join (my coordinates against envelope boundaries) before calculating >>>> the >>>> getValue for my coordinates >>>> >>>> Best, >>>> >>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson < >>>> [email protected]> wrote: >>>> >>>>> I would read the geotiff files as binary: >>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html >>>>> >>>>> Then you can define a udf to extract values directly from the >>>>> geotiffs. If you're on python you can use raster.io to do that. >>>>> >>>>> In java it would look some thing like this: >>>>> >>>>> Integer getValue(byte[] geotiff, double x, double y) >>>>> throws IOException, TransformException { >>>>> try (ByteArrayInputStream inputStream = new >>>>> ByteArrayInputStream(geotiff)) { >>>>> GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream); >>>>> GridCoverage2D grid = geoTiffReader.read(null); >>>>> Raster raster = grid.getRenderedImage().getData(); >>>>> GridGeometry2D gridGeometry = grid.getGridGeometry(); >>>>> >>>>> DirectPosition2D directPosition2D = new DirectPosition2D(x, y); >>>>> GridCoordinates2D gridCoordinates2D = >>>>> gridGeometry.worldToGrid(directPosition2D); >>>>> try { >>>>> int[] pixel = raster.getPixel(gridCoordinates2D.x, >>>>> gridCoordinates2D.y, new int[1]); >>>>> return pixel[0]; >>>>> } catch (ArrayIndexOutOfBoundsException exc) { >>>>> // point is outside the extentent >>>>> result.add(null); >>>>> } >>>>> } >>>>> } >>>>> >>>>> Br, >>>>> Martin Andersson >>>>> >>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes < >>>>> [email protected]>: >>>>> >>>>>> Thanks for the update, guys. >>>>>> >>>>>> I'm not ready to contribute yet. >>>>>> >>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to >>>>>> another format supported by Sedona. If anyone has had this use case >>>>>> before >>>>>> or has any idea, please share. >>>>>> >>>>>> Best, >>>>>> >>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I think you are looking for something like this: >>>>>>> https://postgis.net/docs/RT_ST_Value.html >>>>>>> >>>>>>> The raster support in Sedona is very limited at the moment. The lack >>>>>>> of a proper raster type makes implementing st_value impossible. We had a >>>>>>> brief discussion about that recently. >>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c >>>>>>> >>>>>>> If you want to make a contribution and need some guidance, please >>>>>>> let me know! >>>>>>> >>>>>>> Br, >>>>>>> Martin Andersson >>>>>>> >>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>: >>>>>>> >>>>>>>> Hi Pedro, >>>>>>>> >>>>>>>> I got your point. Unfortunately, we don't have this function yet in >>>>>>>> Sedona. >>>>>>>> But we welcome anyone who want to contribute this to Sedona! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jia >>>>>>>> >>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes < >>>>>>>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > Hi all, >>>>>>>> > >>>>>>>> > Any clue? Or any documentation I can refer to? >>>>>>>> > >>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can >>>>>>>> click a >>>>>>>> > point (coordinates) of the geotiff and get the value in that >>>>>>>> point (in this >>>>>>>> > case 231 of Band 1). >>>>>>>> > >>>>>>>> > [image: image.png] >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > >>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes < >>>>>>>> [email protected]> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> >> Hi Jia, >>>>>>>> >> >>>>>>>> >> Thanks for the fast response. >>>>>>>> >> >>>>>>>> >> With the regular spatial join I’ll get the array of data of the >>>>>>>> whole >>>>>>>> >> geotiff polygon. I was hoping to get the data element for >>>>>>>> specific >>>>>>>> >> coordinates inside that polygon. In other words: I guess the >>>>>>>> array of data >>>>>>>> >> corresponds to all the positions in the polygon, but I want to >>>>>>>> fetch >>>>>>>> >> specific positions. >>>>>>>> >> >>>>>>>> >> Thanks, >>>>>>>> >> >>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote: >>>>>>>> >> >>>>>>>> >>> Hi Pedro, >>>>>>>> >>> >>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you >>>>>>>> will get >>>>>>>> >>> a dataframe with the following schema: >>>>>>>> >>> >>>>>>>> >>> |-- image: struct (nullable = true) >>>>>>>> >>> | |-- origin: string (nullable = true) >>>>>>>> >>> | |-- Geometry: string (nullable = true) >>>>>>>> >>> | |-- height: integer (nullable = true) >>>>>>>> >>> | |-- width: integer (nullable = true) >>>>>>>> >>> | |-- nBands: integer (nullable = true) >>>>>>>> >>> | |-- data: array (nullable = true) >>>>>>>> >>> | | |-- element: double (containsNull = true) >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> You can use the following way to fetch the geometry column and >>>>>>>> perform >>>>>>>> >>> the spatial join; >>>>>>>> >>> >>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as >>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height >>>>>>>> as height", >>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as >>>>>>>> bands") >>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe") >>>>>>>> >>> geotiffDF.show() >>>>>>>> >>> >>>>>>>> >>> More info can be found: >>>>>>>> >>> >>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader >>>>>>>> >>> >>>>>>>> >>> Thanks, >>>>>>>> >>> Jia >>>>>>>> >>> >>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes < >>>>>>>> >>> [email protected]> wrote: >>>>>>>> >>> >>>>>>>> >>>> Hi everyone! >>>>>>>> >>>> >>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I >>>>>>>> understand how to >>>>>>>> >>>> load the dataset, as described in >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader >>>>>>>> >>>> . >>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one >>>>>>>> that >>>>>>>> >>>> contains >>>>>>>> >>>> coordinates, in order to get the elevation data for those >>>>>>>> coordinates. >>>>>>>> >>>> >>>>>>>> >>>> Something along these lines: >>>>>>>> >>>> >>>>>>>> >>>> pointsDF >>>>>>>> >>>> .join(geotiffDF, ...) >>>>>>>> >>>> .select("lon", "lat", "geotiff_data") >>>>>>>> >>>> >>>>>>>> >>>> Are there any examples or documentation I can follow to >>>>>>>> accomplish this? >>>>>>>> >>>> >>>>>>>> >>>> Thanks, >>>>>>>> >>>> >>>>>>>> >>>> -- >>>>>>>> >>>> Pedro Mano Fernandes >>>>>>>> >>>> >>>>>>>> >>> -- >>>>>>>> >> Pedro Mano Fernandes >>>>>>>> >> >>>>>>>> > >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Pedro Mano Fernandes >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Hälsningar, >>>>>>> Martin >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pedro Mano Fernandes >>>>>> >>>>> >>>> >>>> -- >>>> Pedro Mano Fernandes >>>> >>> >> >> -- >> Pedro Mano Fernandes >> > -- Pedro Mano Fernandes
