Hi Martin,

It's weird I don't see your new Raster features in the docs in
https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
documentation was already up-to-date after the release of sedona-1.4.0.

Best regards,

On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <[email protected]>
wrote:

> Hi Martin,
>
> Great news! I'll give it a go and will let you know.
>
> Thanks for letting me know.
> Best regards,
>
> On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
> [email protected]> wrote:
>
>> Hi again Pedro,
>>
>> Since https://github.com/apache/sedona/pull/773 got merged you should
>> now be able to use Apache Sedona for your GeoTiff processing needs. It will
>> be included in the next Sedona release.
>>
>> All feedback is welcome!
>>
>> Br
>> Martin Andersson
>>
>>
>> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
>> [email protected]>:
>>
>>> Hi Martin,
>>>
>>> I've tested your proposal (reading binary and UDF getValue) and it works
>>> fine. I've actually converted the code to Scala easily. Now it's a matter
>>> of building/optimizing around it (spatial join, aggregate points per
>>> geotiff).
>>>
>>> Best,
>>>
>>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
>>> [email protected]> wrote:
>>>
>>>> Yes, there are lots of things to consider when processing large blobs
>>>> in Spark. What I have come to learn:
>>>>  - Do the spatial join (points and the geotiff extent) with as few
>>>> columns as possible. Ideally an id only for the geotiff. After that join
>>>> you can join back the geotiff using the id.
>>>>  - Aggregate the points to an array of points per geotiff. Your
>>>> getValue udf should take an array of points and return an array of values.
>>>> That way each geotiff is only loaded once.
>>>>  - Parquet in Spark is not very good at handling large blobs. If
>>>> reading parquet with geotiffs is slow you can repartition() with a very
>>>> large number to force smaller row groups when writing or use Avro instead.
>>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>>>
>>>> Good luck!
>>>>
>>>> Br,
>>>> Martin Andersson
>>>>
>>>>
>>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>>>> [email protected]>:
>>>>
>>>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>>>> going with geotiff conversions.
>>>>>
>>>>> I'm foreseeing some concerns, though:
>>>>>
>>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>>>    split the geotiff into smaller geotiffs
>>>>>    - I wonder how the spatial partitioning optimization will behave
>>>>>    in such approach - I may have to load smaller geotiffs and use their
>>>>>    geometry to join (my coordinates against envelope boundaries) before
>>>>>    calculating the getValue for my coordinates
>>>>>
>>>>> Best,
>>>>>
>>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I would read the geotiff files as binary:
>>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>>>
>>>>>> Then you can define a udf to extract values directly from the
>>>>>> geotiffs. If you're on python you can use raster.io to do that.
>>>>>>
>>>>>> In java it would look some thing like this:
>>>>>>
>>>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>>>       throws IOException, TransformException {
>>>>>>     try (ByteArrayInputStream inputStream = new
>>>>>> ByteArrayInputStream(geotiff)) {
>>>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>>>
>>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>>>       GridCoordinates2D gridCoordinates2D =
>>>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>>>       try {
>>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>>>> gridCoordinates2D.y, new int[1]);
>>>>>>           return pixel[0];
>>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>>>           // point is outside the extentent
>>>>>>           result.add(null);
>>>>>>       }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> Br,
>>>>>> Martin Andersson
>>>>>>
>>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>>>> [email protected]>:
>>>>>>
>>>>>>> Thanks for the update, guys.
>>>>>>>
>>>>>>> I'm not ready to contribute yet.
>>>>>>>
>>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff
>>>>>>> to another format supported by Sedona. If anyone has had this use case
>>>>>>> before or has any idea, please share.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I think you are looking for something like this:
>>>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>>>
>>>>>>>> The raster support in Sedona is very limited at the moment. The
>>>>>>>> lack of a proper raster type makes implementing st_value impossible. 
>>>>>>>> We had
>>>>>>>> a brief discussion about that recently.
>>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>>>
>>>>>>>> If you want to make a contribution and need some guidance, please
>>>>>>>> let me know!
>>>>>>>>
>>>>>>>> Br,
>>>>>>>> Martin Andersson
>>>>>>>>
>>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <[email protected]>:
>>>>>>>>
>>>>>>>>> Hi Pedro,
>>>>>>>>>
>>>>>>>>> I got your point. Unfortunately, we don't have this function yet
>>>>>>>>> in Sedona.
>>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jia
>>>>>>>>>
>>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>>>> >
>>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I
>>>>>>>>> can click a
>>>>>>>>> > point (coordinates) of the geotiff and get the value in that
>>>>>>>>> point (in this
>>>>>>>>> > case 231 of Band 1).
>>>>>>>>> >
>>>>>>>>> > [image: image.png]
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> >
>>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>>>> [email protected]>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> >> Hi Jia,
>>>>>>>>> >>
>>>>>>>>> >> Thanks for the fast response.
>>>>>>>>> >>
>>>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>>>> whole
>>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
>>>>>>>>> specific
>>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>>>> array of data
>>>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>>>> fetch
>>>>>>>>> >> specific positions.
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >>
>>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <[email protected]> wrote:
>>>>>>>>> >>
>>>>>>>>> >>> Hi Pedro,
>>>>>>>>> >>>
>>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>>>> will get
>>>>>>>>> >>> a dataframe with the following schema:
>>>>>>>>> >>>
>>>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>>>> perform
>>>>>>>>> >>> the spatial join;
>>>>>>>>> >>>
>>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
>>>>>>>>> "image.height as height",
>>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>>>> bands")
>>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>>>> >>> geotiffDF.show()
>>>>>>>>> >>>
>>>>>>>>> >>> More info can be found:
>>>>>>>>> >>>
>>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks,
>>>>>>>>> >>> Jia
>>>>>>>>> >>>
>>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>>>> >>> [email protected]> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>> Hi everyone!
>>>>>>>>> >>>>
>>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>>>> understand how to
>>>>>>>>> >>>> load the dataset, as described in
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>>> >>>> .
>>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>>>> that
>>>>>>>>> >>>> contains
>>>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>>>> coordinates.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Something along these lines:
>>>>>>>>> >>>>
>>>>>>>>> >>>> pointsDF
>>>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>>>> >>>>
>>>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>>>> accomplish this?
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks,
>>>>>>>>> >>>>
>>>>>>>>> >>>> --
>>>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>>>> >>>>
>>>>>>>>> >>> --
>>>>>>>>> >> Pedro Mano Fernandes
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Pedro Mano Fernandes
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hälsningar,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pedro Mano Fernandes
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Pedro Mano Fernandes
>>>>>
>>>>
>>>
>>> --
>>> Pedro Mano Fernandes
>>>
>>
>
> --
> Pedro Mano Fernandes
>


-- 
Pedro Mano Fernandes

Reply via email to