Re: Apache Sedona

Netanel Malka Mon, 24 Aug 2020 01:25:44 -0700

Hi,
Great features.

One note:
- geohash spatial join
If we are going to implement it, I think that it should be nice if we will
implement also ST_GeomFromGeohash and ST_asGeohash. It shouldn't be hard
and it will bring the ability to read and write geohash strings in order to
integrate with other systems.



On Fri, 21 Aug 2020 at 23:50, Paweł Kociński <[email protected]>
wrote:

> - Grant support for Scala 2.12 and Spark 3.0
> I meant here Python.
>
> - Implement loading geospatial data sources (geojson, shapefile, osm, wkb,
> wkt) from Dataframe API like
> -- spark.read.format("geojson").load(path)
>
>  It is possible and I think it will be easier for users to load the data
> (Also agree that is not priority).
>
> - Add broadcast join for joining big and small dataframe
>
> Agree
>
> - Fix issue with 3D geometries while loading shapefile
>
> Exactly
>
> - Add support for multiline geojson (I have some code on my local branch)
>
>
> We have to write our own in that case, it will require some amount of work
> but is doable.
>
> - Add direct writing to geospatial databases like PostgreSQL
>
> I have to analyze spark code and will be back with a solution
>
> - Remove NullPointer exception when there is null value within data or
> data is wrong within some rows.
>
> I meant SQL functions, some time they should replace the value with
> null/Option instead of raising null pointer exception.
>
> - geohash spatial join
>
> I think in some cases it can be more suitable for users. It should not be
> tough to implement but it brings additional value.
>
> pt., 21 sie 2020 o 11:08 Jia Yu <[email protected]> napisał(a):
>
>> Hi Paweł and CCed sedona-dev and other committers,
>>
>> Please find my opinion below.
>>
>> - Grant support for Scala 2.12 and Spark 3.0
>> Jia: the Scala and Java code in the master branch has supported Spark
>> 3.0+ 2.12. We need to support the following: Sedona Scala 2.12 support for
>> other Spark versions and Scala 2.12 support in all Python APIs.
>>
>> - Implement loading geospatial data sources (geojson, shapefile, osm,
>> wkb, wkt) from Dataframe API like
>> -- spark.read.format("geojson").load(path)
>> Jia: Direct DataFrame API support requires a bit more coding effort. I am
>> actually not sure whether this func in DF is extensible. But if so, I am
>> not against it. But it is not the top priority.
>>
>> - Add broadcast join for joining big and small dataframe
>> Jia: Yes, we should have it here:
>> https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/apache/spark/sql/geosparksql/strategy/join/TraitJoinQueryExec.scala#L67
>> - Fix issue with 3D geometries while loading shapefile
>> Jia: How do we fix it? Convert it to a 2D geoms and discard the Z
>> dimension or M dimension?
>>
>> - Add support for multiline geojson (I have some code on my local branch)
>> Jia: This is not easy. In Spark, its DF has a readjson API:
>> https://spark.apache.org/docs/latest/sql-data-sources-json.html Not sure
>> whether we can leverage this.
>>
>> - Add direct writing to geospatial databases like PostgreSQL
>> Jia: Good point. Any particular challenge on this?
>>
>> - Add more geospatial functions
>> Jia: Agree.
>>
>> - Remove NullPointer exception when there is null value within data or
>> data is wrong within some rows
>> Jia: I believe this has been solved by "allowTopologyInvalidGeometries"
>> and "skipSyntaxInvalidGeometries"
>> https://datasystemslab.github.io/GeoSpark/tutorial/rdd/#create-a-generic-spatialrdd-behavoir-changed-in-v120
>>
>> - geohash spatial join
>> Jia: Yes, we can do that. But will it bring in any benefit as opposed to
>> the existing spatial join algorithm?
>>
>> Thanks,
>> Jia
>>
>> On Wed, Aug 19, 2020 at 10:22 AM Paweł Kociński <
>> [email protected]> wrote:
>>
>>> Hi Jia,
>>> I hope you are fine. Do we have some features to add to Apache Sedona
>>> after the code will be merged ?
>>> My ideas of tasks:
>>> - Grant support for Scala 2.12 and Spark 3.0
>>> - Implement loading geospatial data sources (geojson, shapefile, osm,
>>> wkb, wkt) from Dataframe API like
>>> -- spark.read.format("geojson").load(path)
>>> I have some code, but code migration is holding me back
>>>
>>> [image: image.png]
>>> - Add broadcast join for joining big and small dataframe
>>> - Fix issue with 3D geometries while loading shapefile
>>> - Add support for multiline geojson (I have some code on my local branch)
>>> - Add direct writing to geospatial databases like PostgreSQL
>>> - Add more geospatial functions
>>> - Remove NullPointer exception when there is null value within data or
>>> data is wrong within some rows
>>> - geohash spatial join
>>>
>>> What do you think?
>>>
>>> Regards,
>>> Paweł
>>>
>>>
>>> pon., 17 sie 2020 o 07:45 Jia Yu <[email protected]> napisał(a):
>>>
>>>> Hello Paweł,
>>>>
>>>> I just posted the current situation into [email protected].
>>>> The current problem is I have made everything ready to be imported to ASF
>>>> GitHub repo (https://github.com/apache/incubator-sedona). But one
>>>> committer (Masha from Facebook) who made thousands of lines of contribution
>>>> to GeoSpark still didn't submit her CLA. The entire process is currently
>>>> blocked by this.
>>>>
>>>> Mohamed and I have been trying to reach her a couple of times in the
>>>> past 3 weeks but got no reply. I have asked the champion about how we can
>>>> proceed in this case. Let's see what will happen.
>>>>
>>>> Thanks,
>>>> Jia
>>>>
>>>>
>>>> On Sun, Aug 16, 2020 at 9:06 AM Paweł Kociński <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Jia,
>>>>> Do we know when the first release of Apache Sedona will occur ? Can I
>>>>> help with something to make it happen? I have few ideas and some code 
>>>>> which
>>>>> will be useful in the future.
>>>>>
>>>>> Regards,
>>>>> Pawel
>>>>>
>>>>

-- 
Best regards,
Netanel Malka.

Re: Apache Sedona

Reply via email to