Hi again Pedro,

Since https://github.com/apache/sedona/pull/773 got merged you should now
be able to use Apache Sedona for your GeoTiff processing needs. It will be
included in the next Sedona release.

All feedback is welcome!

Br
Martin Andersson


Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
[email protected]>:

> Hi Martin,
>
> I've tested your proposal (reading binary and UDF getValue) and it works
> fine. I've actually converted the code to Scala easily. Now it's a matter
> of building/optimizing around it (spatial join, aggregate points per
> geotiff).
>
> Best,
>
> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
> [email protected]> wrote:
>
>> Yes, there are lots of things to consider when processing large blobs in
>> Spark. What I have come to learn:
>>  - Do the spatial join (points and the geotiff extent) with as few
>> columns as possible. Ideally an id only for the geotiff. After that join
>> you can join back the geotiff using the id.
>>  - Aggregate the points to an array of points per geotiff. Your getValue
>> udf should take an array of points and return an array of values. That way
>> each geotiff is only loaded once.
>>  - Parquet in Spark is not very good at handling large blobs. If reading
>> parquet with geotiffs is slow you can repartition() with a very large
>> number to force smaller row groups when writing or use Avro instead.
>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>
>> Good luck!
>>
>> Br,
>> Martin Andersson
>>
>>
>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>> [email protected]>:
>>
>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>> going with geotiff conversions.
>>>
>>> I'm foreseeing some concerns, though:
>>>
>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>    split the geotiff into smaller geotiffs
>>>    - I wonder how the spatial partitioning optimization will behave in
>>>    such approach - I may have to load smaller geotiffs and use their 
>>> geometry
>>>    to join (my coordinates against envelope boundaries) before calculating 
>>> the
>>>    getValue for my coordinates
>>>
>>> Best,
>>>
>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>> [email protected]> wrote:
>>>
>>>> I would read the geotiff files as binary:
>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>
>>>> Then you can define a udf to extract values directly from the geotiffs.
>>>> If you're on python you can use raster.io to do that.
>>>>
>>>> In java it would look some thing like this:
>>>>
>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>       throws IOException, TransformException {
>>>>     try (ByteArrayInputStream inputStream = new
>>>> ByteArrayInputStream(geotiff)) {
>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>
>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>       GridCoordinates2D gridCoordinates2D =
>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>       try {
>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>> gridCoordinates2D.y, new int[1]);
>>>>           return pixel[0];
>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>           // point is outside the extentent
>>>>           result.add(null);
>>>>       }
>>>>     }
>>>> }
>>>>
>>>> Br,
>>>> Martin Andersson
>>>>
>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>> [email protected]>:
>>>>
>>>>> Thanks for the update, guys.
>>>>>
>>>>> I'm not ready to contribute yet.
>>>>>
>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to
>>>>> another format supported by Sedona. If anyone has had this use case before
>>>>> or has any idea, please share.
>>>>>
>>>>> Best,
>>>>>
>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think you are looking for something like this:
>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>
>>>>>> The raster support in Sedona is very limited at the moment. The lack
>>>>>> of a proper raster type makes implementing st_value impossible. We had a
>>>>>> brief discussion about that recently.
>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>
>>>>>> If you want to make a contribution and need some guidance, please let
>>>>>> me know!
>>>>>>
>>>>>> Br,
>>>>>> Martin Andersson
>>>>>>
>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>:
>>>>>>
>>>>>>> Hi Pedro,
>>>>>>>
>>>>>>> I got your point. Unfortunately, we don't have this function yet in
>>>>>>> Sedona.
>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jia
>>>>>>>
>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi all,
>>>>>>> >
>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>> >
>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can
>>>>>>> click a
>>>>>>> > point (coordinates) of the geotiff and get the value in that point
>>>>>>> (in this
>>>>>>> > case 231 of Band 1).
>>>>>>> >
>>>>>>> > [image: image.png]
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> >
>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>> [email protected]>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >> Hi Jia,
>>>>>>> >>
>>>>>>> >> Thanks for the fast response.
>>>>>>> >>
>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>> whole
>>>>>>> >> geotiff polygon. I was hoping to get the data element for specific
>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>> array of data
>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>> fetch
>>>>>>> >> specific positions.
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >>
>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote:
>>>>>>> >>
>>>>>>> >>> Hi Pedro,
>>>>>>> >>>
>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>> will get
>>>>>>> >>> a dataframe with the following schema:
>>>>>>> >>>
>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>> perform
>>>>>>> >>> the spatial join;
>>>>>>> >>>
>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height
>>>>>>> as height",
>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>> bands")
>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>> >>> geotiffDF.show()
>>>>>>> >>>
>>>>>>> >>> More info can be found:
>>>>>>> >>>
>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>> >>>
>>>>>>> >>> Thanks,
>>>>>>> >>> Jia
>>>>>>> >>>
>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>> >>> [email protected]> wrote:
>>>>>>> >>>
>>>>>>> >>>> Hi everyone!
>>>>>>> >>>>
>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>> understand how to
>>>>>>> >>>> load the dataset, as described in
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>> >>>> .
>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>> that
>>>>>>> >>>> contains
>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>> coordinates.
>>>>>>> >>>>
>>>>>>> >>>> Something along these lines:
>>>>>>> >>>>
>>>>>>> >>>> pointsDF
>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>> >>>>
>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>> accomplish this?
>>>>>>> >>>>
>>>>>>> >>>> Thanks,
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>> >>>>
>>>>>>> >>> --
>>>>>>> >> Pedro Mano Fernandes
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Pedro Mano Fernandes
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hälsningar,
>>>>>> Martin
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pedro Mano Fernandes
>>>>>
>>>>
>>>
>>> --
>>> Pedro Mano Fernandes
>>>
>>
>
> --
> Pedro Mano Fernandes
>

Reply via email to