jiayuasu commented on code in PR #1778:
URL: https://github.com/apache/sedona/pull/1778#discussion_r1944047188
##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a
DataFrame to an SpatialRDD. Ple
spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
```
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions
because doing so
+may introduce duplicate features for most types of spatial data. These
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not
typically
+desired.
+
+=== "Scala"
+
+ ```scala
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
+ var spatialDf = StructuredAdapter.toSpatialPartitionedDf(spatialRDD,
sedona)
+ ```
+
+=== "Java"
+
+ ```java
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
+ Dataset<Row> spatialDf =
StructuredAdapter.toSpatialPartitionedDf(spatialRDD, sedona)
+ ```
+
+=== "Python"
+
+ ```python
+ from sedona.utils.structured_adapter import StructuredAdapter
+
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
Review Comment:
```suggestion
spatialRDD.spatialPartitioningWithoutDuplicates(GridType.KDBTREE)
# Specify the desired number of partitions as 10, though the actual
number may vary
# spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10)
```
##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a
DataFrame to an SpatialRDD. Ple
spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
```
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions
because doing so
+may introduce duplicate features for most types of spatial data. These
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not
typically
+desired.
Review Comment:
```suggestion
desired.
You can use `StructuredAdapter` and the
`spatialRDD.spatialPartitioningWithoutDuplicates` function to obtain a Sedona
DataFrame that is spatially partitioned without duplicates. This is especially
useful for generating balanced GeoParquet files while preserving spatial
proximity within files, which is crucial for optimizing filter pushdown
performance in GeoParquet files.
```
##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a
DataFrame to an SpatialRDD. Ple
spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
```
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions
because doing so
+may introduce duplicate features for most types of spatial data. These
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not
typically
+desired.
+
+=== "Scala"
+
+ ```scala
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
+ var spatialDf = StructuredAdapter.toSpatialPartitionedDf(spatialRDD,
sedona)
+ ```
+
+=== "Java"
+
+ ```java
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
Review Comment:
```suggestion
spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE)
// Specify the desired number of partitions as 10, though the actual
number may vary
// spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10)
```
##########
docs/tutorial/sql.md:
##########
@@ -1638,6 +1638,37 @@ Use SedonaSQL DataFrame-RDD Adapter to convert a
DataFrame to an SpatialRDD. Ple
spatialDf = StructuredAdapter.toDf(spatialRDD, sedona)
```
+### SpatialRDD to DataFrame with spatial partitioning
+
+By default, `StructuredAdapter.toDf()` does not preserve spatial partitions
because doing so
+may introduce duplicate features for most types of spatial data. These
duplicates
+are introduced on purpose to ensure correctness when performing a spatial join;
+however, when using Sedona to prepare a dataset for distribution this is not
typically
+desired.
+
+=== "Scala"
+
+ ```scala
+ spatialRDD.spatialPartitioning(GridType.KDBTREE)
Review Comment:
```suggestion
spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE)
// Specify the desired number of partitions as 10, though the actual
number may vary
// spatialRDD.spatialParitioningWithoutDuplicates(GridType.KDBTREE, 10)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]