Re: [PR] [DOCS] add note about setting checkpoint dir for DBSCAN [sedona]

via GitHub Thu, 09 Jan 2025 01:30:39 -0800


jiayuasu commented on code in PR #1744:
URL: https://github.com/apache/sedona/pull/1744#discussion_r1908425801



##########
docs/tutorial/sql.md:
##########
@@ -858,6 +858,10 @@ The algorithm is available as a Scala and Python function 
called on a spatial da
 
 The first parameter is the dataframe, the next two are the epsilon and 
min_points parameters of the DBSCAN algorithm.
 
+!!!Note
+    The sparkContext's checkpoint directory must be set to use DBSCAN. 
Sedona's DBSCAN implementation uses Graphframes
+    which requires a checkpoint directory to be set. This can be done by 
calling `sparkContext.setCheckpointDir("path/to/checkpoint")`.

Review Comment:
   @james-willis then we need to explain what a checkPointDir is via our doc. 
We should give examples about how to set this dir (locally, on S3, HDFS, ...). 
Distributed DBSCAN is highly anticipated by the community so we should make it 
easy to get started.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DOCS] add note about setting checkpoint dir for DBSCAN [sedona]

Reply via email to