paleolimbot commented on code in PR #1751:
URL: https://github.com/apache/sedona/pull/1751#discussion_r1931088637
##########
spark/common/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala:
##########
@@ -235,6 +236,50 @@ object Adapter {
sparkSession.sqlContext.createDataFrame(rdd, schema)
}
+ /**
+ * Convert a spatial RDD to DataFrame with a given schema keeping spatial
partitioning
+ *
+ * Note that spatial partitioning methods that introduce duplicates will
result in an output
+ * data frame with duplicate features. This property is essential for
implementing correct
+ * joins; however, may introduce surprising results.
+ *
+ * @param spatialRDD
+ * Spatial RDD
+ * @param fieldNames
+ * Desired field names
+ * @param sparkSession
+ * Spark Session
+ * @tparam T
+ * Geometry
+ * @return
+ * DataFrame with the specified field names with spatial partitioning
preserved
+ */
+ def toDfPartitioned[T <: Geometry](
+ spatialRDD: SpatialRDD[T],
+ fieldNames: Seq[String],
+ sparkSession: SparkSession): DataFrame = {
+ @transient lazy val log = LoggerFactory.getLogger(getClass.getName)
+ log.warn(
+ "toDfParitioned() may introduce duplicates when used with
non-specialized partitioning")
Review Comment:
Is this the correct way to go about this? Other classes use `with Logging`
to get a instance-specific logger, but I didn't know how to rig that here since
the `Adapter` is an object and not a class?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]