Hi Paweł and CCed sedona-dev and other committers,
Please find my opinion below.
- Grant support for Scala 2.12 and Spark 3.0
Jia: the Scala and Java code in the master branch has supported Spark 3.0+
2.12. We need to support the following: Sedona Scala 2.12 support for other
Spark versions and Scala 2.12 support in all Python APIs.
- Implement loading geospatial data sources (geojson, shapefile, osm, wkb,
wkt) from Dataframe API like
-- spark.read.format("geojson").load(path)
Jia: Direct DataFrame API support requires a bit more coding effort. I am
actually not sure whether this func in DF is extensible. But if so, I am
not against it. But it is not the top priority.
- Add broadcast join for joining big and small dataframe
Jia: Yes, we should have it here:
https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/apache/spark/sql/geosparksql/strategy/join/TraitJoinQueryExec.scala#L67
- Fix issue with 3D geometries while loading shapefile
Jia: How do we fix it? Convert it to a 2D geoms and discard the Z dimension
or M dimension?
- Add support for multiline geojson (I have some code on my local branch)
Jia: This is not easy. In Spark, its DF has a readjson API:
https://spark.apache.org/docs/latest/sql-data-sources-json.html Not sure
whether we can leverage this.
- Add direct writing to geospatial databases like PostgreSQL
Jia: Good point. Any particular challenge on this?
- Add more geospatial functions
Jia: Agree.
- Remove NullPointer exception when there is null value within data or data
is wrong within some rows
Jia: I believe this has been solved by "allowTopologyInvalidGeometries" and
"skipSyntaxInvalidGeometries"
https://datasystemslab.github.io/GeoSpark/tutorial/rdd/#create-a-generic-spatialrdd-behavoir-changed-in-v120
- geohash spatial join
Jia: Yes, we can do that. But will it bring in any benefit as opposed to
the existing spatial join algorithm?
Thanks,
Jia
On Wed, Aug 19, 2020 at 10:22 AM Paweł Kociński <[email protected]>
wrote:
> Hi Jia,
> I hope you are fine. Do we have some features to add to Apache Sedona
> after the code will be merged ?
> My ideas of tasks:
> - Grant support for Scala 2.12 and Spark 3.0
> - Implement loading geospatial data sources (geojson, shapefile, osm, wkb,
> wkt) from Dataframe API like
> -- spark.read.format("geojson").load(path)
> I have some code, but code migration is holding me back
>
> [image: image.png]
> - Add broadcast join for joining big and small dataframe
> - Fix issue with 3D geometries while loading shapefile
> - Add support for multiline geojson (I have some code on my local branch)
> - Add direct writing to geospatial databases like PostgreSQL
> - Add more geospatial functions
> - Remove NullPointer exception when there is null value within data or
> data is wrong within some rows
> - geohash spatial join
>
> What do you think?
>
> Regards,
> Paweł
>
>
> pon., 17 sie 2020 o 07:45 Jia Yu <[email protected]> napisał(a):
>
>> Hello Paweł,
>>
>> I just posted the current situation into [email protected]. The
>> current problem is I have made everything ready to be imported to ASF
>> GitHub repo (https://github.com/apache/incubator-sedona). But one
>> committer (Masha from Facebook) who made thousands of lines of contribution
>> to GeoSpark still didn't submit her CLA. The entire process is currently
>> blocked by this.
>>
>> Mohamed and I have been trying to reach her a couple of times in the past
>> 3 weeks but got no reply. I have asked the champion about how we can
>> proceed in this case. Let's see what will happen.
>>
>> Thanks,
>> Jia
>>
>>
>> On Sun, Aug 16, 2020 at 9:06 AM Paweł Kociński <[email protected]>
>> wrote:
>>
>>> Hi Jia,
>>> Do we know when the first release of Apache Sedona will occur ? Can I
>>> help with something to make it happen? I have few ideas and some code which
>>> will be useful in the future.
>>>
>>> Regards,
>>> Pawel
>>>
>>