Hi, Great features. One note: - geohash spatial join If we are going to implement it, I think that it should be nice if we will implement also ST_GeomFromGeohash and ST_asGeohash. It shouldn't be hard and it will bring the ability to read and write geohash strings in order to integrate with other systems.
On Fri, 21 Aug 2020 at 23:50, Paweł Kociński <[email protected]> wrote: > - Grant support for Scala 2.12 and Spark 3.0 > I meant here Python. > > - Implement loading geospatial data sources (geojson, shapefile, osm, wkb, > wkt) from Dataframe API like > -- spark.read.format("geojson").load(path) > > It is possible and I think it will be easier for users to load the data > (Also agree that is not priority). > > - Add broadcast join for joining big and small dataframe > > Agree > > - Fix issue with 3D geometries while loading shapefile > > Exactly > > - Add support for multiline geojson (I have some code on my local branch) > > > We have to write our own in that case, it will require some amount of work > but is doable. > > - Add direct writing to geospatial databases like PostgreSQL > > I have to analyze spark code and will be back with a solution > > - Remove NullPointer exception when there is null value within data or > data is wrong within some rows. > > I meant SQL functions, some time they should replace the value with > null/Option instead of raising null pointer exception. > > - geohash spatial join > > I think in some cases it can be more suitable for users. It should not be > tough to implement but it brings additional value. > > pt., 21 sie 2020 o 11:08 Jia Yu <[email protected]> napisał(a): > >> Hi Paweł and CCed sedona-dev and other committers, >> >> Please find my opinion below. >> >> - Grant support for Scala 2.12 and Spark 3.0 >> Jia: the Scala and Java code in the master branch has supported Spark >> 3.0+ 2.12. We need to support the following: Sedona Scala 2.12 support for >> other Spark versions and Scala 2.12 support in all Python APIs. >> >> - Implement loading geospatial data sources (geojson, shapefile, osm, >> wkb, wkt) from Dataframe API like >> -- spark.read.format("geojson").load(path) >> Jia: Direct DataFrame API support requires a bit more coding effort. I am >> actually not sure whether this func in DF is extensible. But if so, I am >> not against it. But it is not the top priority. >> >> - Add broadcast join for joining big and small dataframe >> Jia: Yes, we should have it here: >> https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/apache/spark/sql/geosparksql/strategy/join/TraitJoinQueryExec.scala#L67 >> - Fix issue with 3D geometries while loading shapefile >> Jia: How do we fix it? Convert it to a 2D geoms and discard the Z >> dimension or M dimension? >> >> - Add support for multiline geojson (I have some code on my local branch) >> Jia: This is not easy. In Spark, its DF has a readjson API: >> https://spark.apache.org/docs/latest/sql-data-sources-json.html Not sure >> whether we can leverage this. >> >> - Add direct writing to geospatial databases like PostgreSQL >> Jia: Good point. Any particular challenge on this? >> >> - Add more geospatial functions >> Jia: Agree. >> >> - Remove NullPointer exception when there is null value within data or >> data is wrong within some rows >> Jia: I believe this has been solved by "allowTopologyInvalidGeometries" >> and "skipSyntaxInvalidGeometries" >> https://datasystemslab.github.io/GeoSpark/tutorial/rdd/#create-a-generic-spatialrdd-behavoir-changed-in-v120 >> >> - geohash spatial join >> Jia: Yes, we can do that. But will it bring in any benefit as opposed to >> the existing spatial join algorithm? >> >> Thanks, >> Jia >> >> On Wed, Aug 19, 2020 at 10:22 AM Paweł Kociński < >> [email protected]> wrote: >> >>> Hi Jia, >>> I hope you are fine. Do we have some features to add to Apache Sedona >>> after the code will be merged ? >>> My ideas of tasks: >>> - Grant support for Scala 2.12 and Spark 3.0 >>> - Implement loading geospatial data sources (geojson, shapefile, osm, >>> wkb, wkt) from Dataframe API like >>> -- spark.read.format("geojson").load(path) >>> I have some code, but code migration is holding me back >>> >>> [image: image.png] >>> - Add broadcast join for joining big and small dataframe >>> - Fix issue with 3D geometries while loading shapefile >>> - Add support for multiline geojson (I have some code on my local branch) >>> - Add direct writing to geospatial databases like PostgreSQL >>> - Add more geospatial functions >>> - Remove NullPointer exception when there is null value within data or >>> data is wrong within some rows >>> - geohash spatial join >>> >>> What do you think? >>> >>> Regards, >>> Paweł >>> >>> >>> pon., 17 sie 2020 o 07:45 Jia Yu <[email protected]> napisał(a): >>> >>>> Hello Paweł, >>>> >>>> I just posted the current situation into [email protected]. >>>> The current problem is I have made everything ready to be imported to ASF >>>> GitHub repo (https://github.com/apache/incubator-sedona). But one >>>> committer (Masha from Facebook) who made thousands of lines of contribution >>>> to GeoSpark still didn't submit her CLA. The entire process is currently >>>> blocked by this. >>>> >>>> Mohamed and I have been trying to reach her a couple of times in the >>>> past 3 weeks but got no reply. I have asked the champion about how we can >>>> proceed in this case. Let's see what will happen. >>>> >>>> Thanks, >>>> Jia >>>> >>>> >>>> On Sun, Aug 16, 2020 at 9:06 AM Paweł Kociński < >>>> [email protected]> wrote: >>>> >>>>> Hi Jia, >>>>> Do we know when the first release of Apache Sedona will occur ? Can I >>>>> help with something to make it happen? I have few ideas and some code >>>>> which >>>>> will be useful in the future. >>>>> >>>>> Regards, >>>>> Pawel >>>>> >>>> -- Best regards, Netanel Malka.
