Re: [PR] [SEDONA-705] Add unique partitioner wrapper to enable partitioned writes with Sedona [sedona]

via GitHub Wed, 05 Feb 2025 19:45:27 -0800


jiayuasu commented on code in PR #1778:
URL: https://github.com/apache/sedona/pull/1778#discussion_r1944047188



##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a 
DataFrame to an SpatialRDD. Ple
        spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
        ```
 
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions 
because doing so
+may introduce duplicate features for most types of spatial data. These 
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not 
typically
+desired.
+
+=== "Scala"
+
+       ```scala
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)
+       var spatialDf = StructuredAdapter.toSpatialPartitionedDf(spatialRDD, 
sedona)
+       ```
+
+=== "Java"
+
+       ```java
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)
+       Dataset<Row> spatialDf = 
StructuredAdapter.toSpatialPartitionedDf(spatialRDD, sedona)
+       ```
+
+=== "Python"
+
+       ```python
+       from sedona.utils.structured_adapter import StructuredAdapter
+
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)

Review Comment:
   ```suggestion
        spatialRDD.spatialPartitioningWithoutDuplicates(GridType.KDBTREE)
        # Specify the desired number of partitions as 10, though the actual 
number may vary
        # spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10)
   ```



##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a 
DataFrame to an SpatialRDD. Ple
        spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
        ```
 
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions 
because doing so
+may introduce duplicate features for most types of spatial data. These 
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not 
typically
+desired.

Review Comment:
   ```suggestion
   desired.
   
   You can use `StructuredAdapter` and the 
`spatialRDD.spatialPartitioningWithoutDuplicates` function to obtain a Sedona 
DataFrame that is spatially partitioned without duplicates. This is especially 
useful for generating balanced GeoParquet files while preserving spatial 
proximity within files, which is crucial for optimizing filter pushdown 
performance in GeoParquet files.
   ```



##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a 
DataFrame to an SpatialRDD. Ple
        spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
        ```
 
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions 
because doing so
+may introduce duplicate features for most types of spatial data. These 
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not 
typically
+desired.
+
+=== "Scala"
+
+       ```scala
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)
+       var spatialDf = StructuredAdapter.toSpatialPartitionedDf(spatialRDD, 
sedona)
+       ```
+
+=== "Java"
+
+       ```java
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)

Review Comment:
   ```suggestion
        spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE)
        // Specify the desired number of partitions as 10, though the actual 
number may vary
        // spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10) 
   ```



##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a 
DataFrame to an SpatialRDD. Ple
        spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
        ```
 
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions 
because doing so
+may introduce duplicate features for most types of spatial data. These 
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not 
typically
+desired.
+
+=== "Scala"
+
+       ```scala
+       spatialRDD.spatialPartitioning(GridType.KDBTREE)

Review Comment:
   ```suggestion
        spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE)
        // Specify the desired number of partitions as 10, though the actual 
number may vary
        // spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [SEDONA-705] Add unique partitioner wrapper to enable partitioned writes with Sedona [sedona]

Reply via email to