Hi Martin,

I've tested your proposal (reading binary and UDF getValue) and it works
fine. I've actually converted the code to Scala easily. Now it's a matter
of building/optimizing around it (spatial join, aggregate points per
geotiff).

Best,

On Fri, 20 Jan 2023 at 13:47, Martin Andersson <[email protected]>
wrote:

> Yes, there are lots of things to consider when processing large blobs in
> Spark. What I have come to learn:
>  - Do the spatial join (points and the geotiff extent) with as few columns
> as possible. Ideally an id only for the geotiff. After that join you can
> join back the geotiff using the id.
>  - Aggregate the points to an array of points per geotiff. Your getValue
> udf should take an array of points and return an array of values. That way
> each geotiff is only loaded once.
>  - Parquet in Spark is not very good at handling large blobs. If reading
> parquet with geotiffs is slow you can repartition() with a very large
> number to force smaller row groups when writing or use Avro instead.
> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>
> Good luck!
>
> Br,
> Martin Andersson
>
>
> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
> [email protected]>:
>
>> Thanks Martin, it sounds promising. I'll actually give it a try before
>> going with geotiff conversions.
>>
>> I'm foreseeing some concerns, though:
>>
>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>    split the geotiff into smaller geotiffs
>>    - I wonder how the spatial partitioning optimization will behave in
>>    such approach - I may have to load smaller geotiffs and use their geometry
>>    to join (my coordinates against envelope boundaries) before calculating 
>> the
>>    getValue for my coordinates
>>
>> Best,
>>
>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>> [email protected]> wrote:
>>
>>> I would read the geotiff files as binary:
>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>
>>> Then you can define a udf to extract values directly from the geotiffs.
>>> If you're on python you can use raster.io to do that.
>>>
>>> In java it would look some thing like this:
>>>
>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>       throws IOException, TransformException {
>>>     try (ByteArrayInputStream inputStream = new
>>> ByteArrayInputStream(geotiff)) {
>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>       Raster raster = grid.getRenderedImage().getData();
>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>
>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>       GridCoordinates2D gridCoordinates2D =
>>> gridGeometry.worldToGrid(directPosition2D);
>>>       try {
>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>> gridCoordinates2D.y, new int[1]);
>>>           return pixel[0];
>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>           // point is outside the extentent
>>>           result.add(null);
>>>       }
>>>     }
>>> }
>>>
>>> Br,
>>> Martin Andersson
>>>
>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>> [email protected]>:
>>>
>>>> Thanks for the update, guys.
>>>>
>>>> I'm not ready to contribute yet.
>>>>
>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to
>>>> another format supported by Sedona. If anyone has had this use case before
>>>> or has any idea, please share.
>>>>
>>>> Best,
>>>>
>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think you are looking for something like this:
>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>
>>>>> The raster support in Sedona is very limited at the moment. The lack
>>>>> of a proper raster type makes implementing st_value impossible. We had a
>>>>> brief discussion about that recently.
>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>
>>>>> If you want to make a contribution and need some guidance, please let
>>>>> me know!
>>>>>
>>>>> Br,
>>>>> Martin Andersson
>>>>>
>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>:
>>>>>
>>>>>> Hi Pedro,
>>>>>>
>>>>>> I got your point. Unfortunately, we don't have this function yet in
>>>>>> Sedona.
>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>
>>>>>> Thanks,
>>>>>> Jia
>>>>>>
>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi all,
>>>>>> >
>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>> >
>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can
>>>>>> click a
>>>>>> > point (coordinates) of the geotiff and get the value in that point
>>>>>> (in this
>>>>>> > case 231 of Band 1).
>>>>>> >
>>>>>> > [image: image.png]
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>> [email protected]>
>>>>>> > wrote:
>>>>>> >
>>>>>> >> Hi Jia,
>>>>>> >>
>>>>>> >> Thanks for the fast response.
>>>>>> >>
>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>> whole
>>>>>> >> geotiff polygon. I was hoping to get the data element for specific
>>>>>> >> coordinates inside that polygon. In other words: I guess the array
>>>>>> of data
>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>> fetch
>>>>>> >> specific positions.
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >>
>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote:
>>>>>> >>
>>>>>> >>> Hi Pedro,
>>>>>> >>>
>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>> will get
>>>>>> >>> a dataframe with the following schema:
>>>>>> >>>
>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>> perform
>>>>>> >>> the spatial join;
>>>>>> >>>
>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height
>>>>>> as height",
>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>> bands")
>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>> >>> geotiffDF.show()
>>>>>> >>>
>>>>>> >>> More info can be found:
>>>>>> >>>
>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Jia
>>>>>> >>>
>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>> >>> [email protected]> wrote:
>>>>>> >>>
>>>>>> >>>> Hi everyone!
>>>>>> >>>>
>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I understand
>>>>>> how to
>>>>>> >>>> load the dataset, as described in
>>>>>> >>>>
>>>>>> >>>>
>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>> >>>> .
>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>> that
>>>>>> >>>> contains
>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>> coordinates.
>>>>>> >>>>
>>>>>> >>>> Something along these lines:
>>>>>> >>>>
>>>>>> >>>> pointsDF
>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>> >>>>
>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>> accomplish this?
>>>>>> >>>>
>>>>>> >>>> Thanks,
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> Pedro Mano Fernandes
>>>>>> >>>>
>>>>>> >>> --
>>>>>> >> Pedro Mano Fernandes
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Pedro Mano Fernandes
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hälsningar,
>>>>> Martin
>>>>>
>>>>
>>>>
>>>> --
>>>> Pedro Mano Fernandes
>>>>
>>>
>>
>> --
>> Pedro Mano Fernandes
>>
>

-- 
Pedro Mano Fernandes

Reply via email to