Hi Martin, It's weird I don't see your new Raster features in the docs in https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the documentation was already up-to-date after the release of sedona-1.4.0.
Best regards, On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <[email protected]> wrote: > Hi Martin, > > Great news! I'll give it a go and will let you know. > > Thanks for letting me know. > Best regards, > > On Tue, 28 Feb 2023 at 14:53, Martin Andersson < > [email protected]> wrote: > >> Hi again Pedro, >> >> Since https://github.com/apache/sedona/pull/773 got merged you should >> now be able to use Apache Sedona for your GeoTiff processing needs. It will >> be included in the next Sedona release. >> >> All feedback is welcome! >> >> Br >> Martin Andersson >> >> >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes < >> [email protected]>: >> >>> Hi Martin, >>> >>> I've tested your proposal (reading binary and UDF getValue) and it works >>> fine. I've actually converted the code to Scala easily. Now it's a matter >>> of building/optimizing around it (spatial join, aggregate points per >>> geotiff). >>> >>> Best, >>> >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson < >>> [email protected]> wrote: >>> >>>> Yes, there are lots of things to consider when processing large blobs >>>> in Spark. What I have come to learn: >>>> - Do the spatial join (points and the geotiff extent) with as few >>>> columns as possible. Ideally an id only for the geotiff. After that join >>>> you can join back the geotiff using the id. >>>> - Aggregate the points to an array of points per geotiff. Your >>>> getValue udf should take an array of points and return an array of values. >>>> That way each geotiff is only loaded once. >>>> - Parquet in Spark is not very good at handling large blobs. If >>>> reading parquet with geotiffs is slow you can repartition() with a very >>>> large number to force smaller row groups when writing or use Avro instead. >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/ >>>> >>>> Good luck! >>>> >>>> Br, >>>> Martin Andersson >>>> >>>> >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes < >>>> [email protected]>: >>>> >>>>> Thanks Martin, it sounds promising. I'll actually give it a try before >>>>> going with geotiff conversions. >>>>> >>>>> I'm foreseeing some concerns, though: >>>>> >>>>> - I'm afraid it won't be optimal for a big geotiff - I may have to >>>>> split the geotiff into smaller geotiffs >>>>> - I wonder how the spatial partitioning optimization will behave >>>>> in such approach - I may have to load smaller geotiffs and use their >>>>> geometry to join (my coordinates against envelope boundaries) before >>>>> calculating the getValue for my coordinates >>>>> >>>>> Best, >>>>> >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson < >>>>> [email protected]> wrote: >>>>> >>>>>> I would read the geotiff files as binary: >>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html >>>>>> >>>>>> Then you can define a udf to extract values directly from the >>>>>> geotiffs. If you're on python you can use raster.io to do that. >>>>>> >>>>>> In java it would look some thing like this: >>>>>> >>>>>> Integer getValue(byte[] geotiff, double x, double y) >>>>>> throws IOException, TransformException { >>>>>> try (ByteArrayInputStream inputStream = new >>>>>> ByteArrayInputStream(geotiff)) { >>>>>> GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream); >>>>>> GridCoverage2D grid = geoTiffReader.read(null); >>>>>> Raster raster = grid.getRenderedImage().getData(); >>>>>> GridGeometry2D gridGeometry = grid.getGridGeometry(); >>>>>> >>>>>> DirectPosition2D directPosition2D = new DirectPosition2D(x, y); >>>>>> GridCoordinates2D gridCoordinates2D = >>>>>> gridGeometry.worldToGrid(directPosition2D); >>>>>> try { >>>>>> int[] pixel = raster.getPixel(gridCoordinates2D.x, >>>>>> gridCoordinates2D.y, new int[1]); >>>>>> return pixel[0]; >>>>>> } catch (ArrayIndexOutOfBoundsException exc) { >>>>>> // point is outside the extentent >>>>>> result.add(null); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> Br, >>>>>> Martin Andersson >>>>>> >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes < >>>>>> [email protected]>: >>>>>> >>>>>>> Thanks for the update, guys. >>>>>>> >>>>>>> I'm not ready to contribute yet. >>>>>>> >>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff >>>>>>> to another format supported by Sedona. If anyone has had this use case >>>>>>> before or has any idea, please share. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I think you are looking for something like this: >>>>>>>> https://postgis.net/docs/RT_ST_Value.html >>>>>>>> >>>>>>>> The raster support in Sedona is very limited at the moment. The >>>>>>>> lack of a proper raster type makes implementing st_value impossible. >>>>>>>> We had >>>>>>>> a brief discussion about that recently. >>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c >>>>>>>> >>>>>>>> If you want to make a contribution and need some guidance, please >>>>>>>> let me know! >>>>>>>> >>>>>>>> Br, >>>>>>>> Martin Andersson >>>>>>>> >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>: >>>>>>>> >>>>>>>>> Hi Pedro, >>>>>>>>> >>>>>>>>> I got your point. Unfortunately, we don't have this function yet >>>>>>>>> in Sedona. >>>>>>>>> But we welcome anyone who want to contribute this to Sedona! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jia >>>>>>>>> >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes < >>>>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> > Hi all, >>>>>>>>> > >>>>>>>>> > Any clue? Or any documentation I can refer to? >>>>>>>>> > >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I >>>>>>>>> can click a >>>>>>>>> > point (coordinates) of the geotiff and get the value in that >>>>>>>>> point (in this >>>>>>>>> > case 231 of Band 1). >>>>>>>>> > >>>>>>>>> > [image: image.png] >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes < >>>>>>>>> [email protected]> >>>>>>>>> > wrote: >>>>>>>>> > >>>>>>>>> >> Hi Jia, >>>>>>>>> >> >>>>>>>>> >> Thanks for the fast response. >>>>>>>>> >> >>>>>>>>> >> With the regular spatial join I’ll get the array of data of the >>>>>>>>> whole >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for >>>>>>>>> specific >>>>>>>>> >> coordinates inside that polygon. In other words: I guess the >>>>>>>>> array of data >>>>>>>>> >> corresponds to all the positions in the polygon, but I want to >>>>>>>>> fetch >>>>>>>>> >> specific positions. >>>>>>>>> >> >>>>>>>>> >> Thanks, >>>>>>>>> >> >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote: >>>>>>>>> >> >>>>>>>>> >>> Hi Pedro, >>>>>>>>> >>> >>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you >>>>>>>>> will get >>>>>>>>> >>> a dataframe with the following schema: >>>>>>>>> >>> >>>>>>>>> >>> |-- image: struct (nullable = true) >>>>>>>>> >>> | |-- origin: string (nullable = true) >>>>>>>>> >>> | |-- Geometry: string (nullable = true) >>>>>>>>> >>> | |-- height: integer (nullable = true) >>>>>>>>> >>> | |-- width: integer (nullable = true) >>>>>>>>> >>> | |-- nBands: integer (nullable = true) >>>>>>>>> >>> | |-- data: array (nullable = true) >>>>>>>>> >>> | | |-- element: double (containsNull = true) >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> You can use the following way to fetch the geometry column and >>>>>>>>> perform >>>>>>>>> >>> the spatial join; >>>>>>>>> >>> >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", >>>>>>>>> "image.height as height", >>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as >>>>>>>>> bands") >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe") >>>>>>>>> >>> geotiffDF.show() >>>>>>>>> >>> >>>>>>>>> >>> More info can be found: >>>>>>>>> >>> >>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader >>>>>>>>> >>> >>>>>>>>> >>> Thanks, >>>>>>>>> >>> Jia >>>>>>>>> >>> >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes < >>>>>>>>> >>> [email protected]> wrote: >>>>>>>>> >>> >>>>>>>>> >>>> Hi everyone! >>>>>>>>> >>>> >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I >>>>>>>>> understand how to >>>>>>>>> >>>> load the dataset, as described in >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader >>>>>>>>> >>>> . >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one >>>>>>>>> that >>>>>>>>> >>>> contains >>>>>>>>> >>>> coordinates, in order to get the elevation data for those >>>>>>>>> coordinates. >>>>>>>>> >>>> >>>>>>>>> >>>> Something along these lines: >>>>>>>>> >>>> >>>>>>>>> >>>> pointsDF >>>>>>>>> >>>> .join(geotiffDF, ...) >>>>>>>>> >>>> .select("lon", "lat", "geotiff_data") >>>>>>>>> >>>> >>>>>>>>> >>>> Are there any examples or documentation I can follow to >>>>>>>>> accomplish this? >>>>>>>>> >>>> >>>>>>>>> >>>> Thanks, >>>>>>>>> >>>> >>>>>>>>> >>>> -- >>>>>>>>> >>>> Pedro Mano Fernandes >>>>>>>>> >>>> >>>>>>>>> >>> -- >>>>>>>>> >> Pedro Mano Fernandes >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > Pedro Mano Fernandes >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Hälsningar, >>>>>>>> Martin >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pedro Mano Fernandes >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Pedro Mano Fernandes >>>>> >>>> >>> >>> -- >>> Pedro Mano Fernandes >>> >> > > -- > Pedro Mano Fernandes > -- Pedro Mano Fernandes
