Hi Martin,

Great news! I'll give it a go and will let you know.

Thanks for letting me know.
Best regards,

On Tue, 28 Feb 2023 at 14:53, Martin Andersson <[email protected]>
wrote:

> Hi again Pedro,
>
> Since https://github.com/apache/sedona/pull/773 got merged you should now
> be able to use Apache Sedona for your GeoTiff processing needs. It will be
> included in the next Sedona release.
>
> All feedback is welcome!
>
> Br
> Martin Andersson
>
>
> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
> [email protected]>:
>
>> Hi Martin,
>>
>> I've tested your proposal (reading binary and UDF getValue) and it works
>> fine. I've actually converted the code to Scala easily. Now it's a matter
>> of building/optimizing around it (spatial join, aggregate points per
>> geotiff).
>>
>> Best,
>>
>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
>> [email protected]> wrote:
>>
>>> Yes, there are lots of things to consider when processing large blobs in
>>> Spark. What I have come to learn:
>>>  - Do the spatial join (points and the geotiff extent) with as few
>>> columns as possible. Ideally an id only for the geotiff. After that join
>>> you can join back the geotiff using the id.
>>>  - Aggregate the points to an array of points per geotiff. Your getValue
>>> udf should take an array of points and return an array of values. That way
>>> each geotiff is only loaded once.
>>>  - Parquet in Spark is not very good at handling large blobs. If reading
>>> parquet with geotiffs is slow you can repartition() with a very large
>>> number to force smaller row groups when writing or use Avro instead.
>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>>
>>> Good luck!
>>>
>>> Br,
>>> Martin Andersson
>>>
>>>
>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>>> [email protected]>:
>>>
>>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>>> going with geotiff conversions.
>>>>
>>>> I'm foreseeing some concerns, though:
>>>>
>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>>    split the geotiff into smaller geotiffs
>>>>    - I wonder how the spatial partitioning optimization will behave in
>>>>    such approach - I may have to load smaller geotiffs and use their 
>>>> geometry
>>>>    to join (my coordinates against envelope boundaries) before calculating 
>>>> the
>>>>    getValue for my coordinates
>>>>
>>>> Best,
>>>>
>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>>> [email protected]> wrote:
>>>>
>>>>> I would read the geotiff files as binary:
>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>>
>>>>> Then you can define a udf to extract values directly from the
>>>>> geotiffs. If you're on python you can use raster.io to do that.
>>>>>
>>>>> In java it would look some thing like this:
>>>>>
>>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>>       throws IOException, TransformException {
>>>>>     try (ByteArrayInputStream inputStream = new
>>>>> ByteArrayInputStream(geotiff)) {
>>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>>
>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>>       GridCoordinates2D gridCoordinates2D =
>>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>>       try {
>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>>> gridCoordinates2D.y, new int[1]);
>>>>>           return pixel[0];
>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>>           // point is outside the extentent
>>>>>           result.add(null);
>>>>>       }
>>>>>     }
>>>>> }
>>>>>
>>>>> Br,
>>>>> Martin Andersson
>>>>>
>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>>> [email protected]>:
>>>>>
>>>>>> Thanks for the update, guys.
>>>>>>
>>>>>> I'm not ready to contribute yet.
>>>>>>
>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to
>>>>>> another format supported by Sedona. If anyone has had this use case 
>>>>>> before
>>>>>> or has any idea, please share.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think you are looking for something like this:
>>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>>
>>>>>>> The raster support in Sedona is very limited at the moment. The lack
>>>>>>> of a proper raster type makes implementing st_value impossible. We had a
>>>>>>> brief discussion about that recently.
>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>>
>>>>>>> If you want to make a contribution and need some guidance, please
>>>>>>> let me know!
>>>>>>>
>>>>>>> Br,
>>>>>>> Martin Andersson
>>>>>>>
>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>:
>>>>>>>
>>>>>>>> Hi Pedro,
>>>>>>>>
>>>>>>>> I got your point. Unfortunately, we don't have this function yet in
>>>>>>>> Sedona.
>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jia
>>>>>>>>
>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Hi all,
>>>>>>>> >
>>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>>> >
>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can
>>>>>>>> click a
>>>>>>>> > point (coordinates) of the geotiff and get the value in that
>>>>>>>> point (in this
>>>>>>>> > case 231 of Band 1).
>>>>>>>> >
>>>>>>>> > [image: image.png]
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>>> [email protected]>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> >> Hi Jia,
>>>>>>>> >>
>>>>>>>> >> Thanks for the fast response.
>>>>>>>> >>
>>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>>> whole
>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
>>>>>>>> specific
>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>>> array of data
>>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>>> fetch
>>>>>>>> >> specific positions.
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >>
>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote:
>>>>>>>> >>
>>>>>>>> >>> Hi Pedro,
>>>>>>>> >>>
>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>>> will get
>>>>>>>> >>> a dataframe with the following schema:
>>>>>>>> >>>
>>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>>> perform
>>>>>>>> >>> the spatial join;
>>>>>>>> >>>
>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height
>>>>>>>> as height",
>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>>> bands")
>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>>> >>> geotiffDF.show()
>>>>>>>> >>>
>>>>>>>> >>> More info can be found:
>>>>>>>> >>>
>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>> >>>
>>>>>>>> >>> Thanks,
>>>>>>>> >>> Jia
>>>>>>>> >>>
>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>>> >>> [email protected]> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> Hi everyone!
>>>>>>>> >>>>
>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>>> understand how to
>>>>>>>> >>>> load the dataset, as described in
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>> >>>> .
>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>>> that
>>>>>>>> >>>> contains
>>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>>> coordinates.
>>>>>>>> >>>>
>>>>>>>> >>>> Something along these lines:
>>>>>>>> >>>>
>>>>>>>> >>>> pointsDF
>>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>>> >>>>
>>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>>> accomplish this?
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks,
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>>> >>>>
>>>>>>>> >>> --
>>>>>>>> >> Pedro Mano Fernandes
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Pedro Mano Fernandes
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Hälsningar,
>>>>>>> Martin
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pedro Mano Fernandes
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Pedro Mano Fernandes
>>>>
>>>
>>
>> --
>> Pedro Mano Fernandes
>>
>

-- 
Pedro Mano Fernandes

Reply via email to