[I] Save / Load indexed spatial & partitioned Rdd [sedona]

via GitHub Mon, 22 Jan 2024 07:01:44 -0800


vbmacher opened a new issue, #1213:
URL: https://github.com/apache/sedona/issues/1213


   ## Expected behavior
   
   Maybe this is possible somehow, but I haven't find this anywhere. I'm 
relatively new to Sedona and Geo-processing. 
   I'd like to see a possibility to save and then load a spatial RDD which is 
already analyzed, partitioned and possibly with the index. We have a use case 
we use such dataset in many jobs and it's time-consuming to create the 
partitioning & build index every time.
   Not sure if it's possible though. 
   
   For example:
   
   ```
   // save:
   val spatialRdd = Adapter.toSpatialRdd(df, ...)
   spatialRdd.analyze()
   spatialRdd.spatialPartitioning(GridType.KDBTREE, math.min(Integer.MAX_VALUE, 
df.count() / 2).toInt) // IllegalArgumentException: [Sedona] Number of 
partitions cannot be larger than half of total records num 
   spatialRdd.buildIndex(IndexType.RTREE, true)
   SomeSedonaUtility.saveSpatialRdd(spatialRdd, path) // <-- save with index 
and partitioned
   
   // load:
   val rdd = SomeSedonaUtility.loadSpatialRdd(path)
   
   // and usage:
   val otherRdd = Adapter.toSpatialRdd(otherDs, ...)
   otherRdd.spatialPartitioning(rdd.getPartitioner)
   
   val useIndex = true
   val considerBoundaryIntersection = SpatialPredicate.COVERS
   val params = new JoinQuery.JoinParams(useIndex, 
considerBoundaryIntersection, IndexType.RTREE, JoinBuildSide.LEFT)
   
   val joined = JoinQuery.spatialJoin(rdd, otherRdd, params)
   ```
   
   
   ## Actual behavior
   
   Index & partitioning must be set at runtime (to my knowledge).
   
   ## Steps to reproduce the problem
   
   The feature is missing, so it's not possible to reproduce it.
   
   ## Settings
   
   Sedona version = 1.5.1
   
   Apache Spark version = 3.5
   
   API type = Scala
   
   Scala version = 2.12
   
   JRE version = 1.8
   
   Environment = EMR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Save / Load indexed spatial & partitioned Rdd [sedona]

Reply via email to