This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git


The following commit(s) were added to refs/heads/master by this push:
     new a50f35fbb [DOCS] AWS Glue Tutorial: replace s3 instructions with maven 
links (#1506)
a50f35fbb is described below

commit a50f35fbbc95ae71ce873ea90f8953425383ad49
Author: James Willis <[email protected]>
AuthorDate: Tue Jul 2 00:40:37 2024 -0700

    [DOCS] AWS Glue Tutorial: replace s3 instructions with maven links (#1506)
    
    * AWS Glue Tutorial: replace s3 instructions with maven links
    
    * capitalize python
    
    * use current_geotools
    
    ---------
    
    Co-authored-by: jameswillis <[email protected]>
---
 docs/setup/glue.md | 51 +++++++++++++--------------------------------------
 1 file changed, 13 insertions(+), 38 deletions(-)

diff --git a/docs/setup/glue.md b/docs/setup/glue.md
index e13e6b478..09f038439 100644
--- a/docs/setup/glue.md
+++ b/docs/setup/glue.md
@@ -1,35 +1,18 @@
 
 This tutorial will cover how to configure both a glue notebook and a glue ETL 
job. The tutorial is written assuming you
-have a working knowledge of AWS Glue jobs and S3.
+have a working knowledge of AWS Glue jobs.
 
 In the tutorial, we use
 Sedona {{ sedona.current_version }} and [Glue 
4.0](https://docs.aws.amazon.com/glue/latest/dg/release-notes.html) which runs 
on Spark 3.3.0, Java 8, Scala 2.12,
 and Python 3.10. We recommend Sedona-1.3.1-incubating and above for Glue.
 
-## Stage Sedona Jar in S3
+## Gather Maven Links
 
-In an AWS S3 bucket you will need the Sedona-spark-shaded and geotools-wrapper 
jars. There are two options to get these.
-Ensure the locations of the jars are accessible to your glue job.
+You will need to point your glue job to the Sedona and Geotools jars. We 
recommend using the jars available from maven. The links below are those 
intended for Glue 4.0
 
-!!!note
-    Ensure you pick a version for Scala 2.12 and Spark 3.0. The Spark 3.4 and 
Scala 2.13 jars are not compatible with
-    Glue 4.0.
-
-### Option 1: Use the Wherobots-hosted jars
-
-[Wherobots](https://wherobots.com/) provides a public S3 bucket with the 
necessary jars. You can point to these directly in your glue jobs'
-configurations. For {{ sedona.current_version }}, the Sedona and geotools jars 
are available at the following locations:
-
-* `s3://wherobots-sedona-jars/{{ sedona.current_version 
}}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar`
-* `s3://wherobots-sedona-jars/{{ sedona.current_version }}/geotools-wrapper-{{ 
sedona.current_version }}-28.2.jar`
-
-### Option 2: Stage your own Jars
-
-In your S3 bucket, add the Sedona Jar from
-[Maven 
Central](https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{
 sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar).
+Sedona Jar: [Maven 
Central](https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{
 sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar)
 
-Similarly, add the Geotools Wrapper Jar
-from [Maven 
Central](https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ 
sedona.current_version }}-28.2/geotools-wrapper-{{ sedona.current_version 
}}-28.2.jar).
+Geotools Jar: [Maven 
Central](https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ 
sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar)
 
 !!!note
     If you use Sedona 1.3.1-incubating, please use 
`sedona-python-adpater-3.0_2.12` jar in the content above, instead
@@ -38,28 +21,20 @@ from [Maven 
Central](https://repo1.maven.org/maven2/org/datasyslab/geotools-wrap
 
 ## Configure Glue Job
 
-Once your jars are staged in S3, you can configure your Glue job to use them, 
as well as the apache-sedona python
+Once you have your jar links, you can configure your Glue job to use them, as 
well as the apache-sedona Python
 package. How you do this varies slightly between the notebook and the script 
job types.
 
 !!!note
-    Always ensure that the Sedona version of the jars and the python package 
match.
+    Always ensure that the Sedona version of the jars and the Python package 
match.
 
 ### Notebook Job
 
-Add the following cell magics before starting your sparkContext of 
glueContext. The first points to the jars put in s3,
-and the second installs the Sedona python package directly from pip.
-
-```python
-# Sedona Config
-%extra_jars s3://path/to/my-sedona.jar, s3://path/to/my-geotools.jar
-%additional_python_modules apache-sedona
-```
-
-If using the Wherobots-provided jars, the cell magics would look like this:
+Add the following cell magics before starting your sparkContext or 
glueContext. The first points to the jars,
+and the second installs the Sedona Python package directly from pip.
 
 ```python
 # Sedona Config
-%extra_jars s3://wherobots-sedona-jars/{{ sedona.current_version 
}}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar, 
s3://wherobots-sedona-jars/{{ sedona.current_version }}/geotools-wrapper-{{ 
sedona.current_version }}-28.2.jar
+%extra_jars 
https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{
 sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar, 
https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ 
sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
 %additional_python_modules apache-sedona=={{ sedona.current_version }}
 ```
 
@@ -72,7 +47,7 @@ If you are using the example notebook from glue, the first 
cell should now look
 %number_of_workers 5
 
 # Sedona Config
-%extra_jars s3://wherobots-sedona-jars/{{ sedona.current_version 
}}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar, 
s3://wherobots-sedona-jars/{{ sedona.current_version }}/geotools-wrapper-{{ 
sedona.current_version }}-28.2.jar
+%extra_jars 
https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{
 sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar, 
https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ 
sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
 %additional_python_modules apache-sedona=={{ sedona.current_version }}
 
 
@@ -101,9 +76,9 @@ sedona.sql("SELECT ST_POINT(1., 2.) as geom").show()
 ### ETL Job
 
 Glue also calls these Scripts. From your job's page, navigate to the "Job 
details" tab. At the bottom of the page expand
-the "Advanced properties" section. In the "Dependent JARs path" field, add the 
paths to the jars in S3, separated by a comma.
+the "Advanced properties" section. In the "Dependent JARs path" field, add the 
paths to the jars, separated by a comma.
 
-To add the Sedona python package, navigate to the "Job Parameters" section and 
add a new parameter with the key
+To add the Sedona Python package, navigate to the "Job Parameters" section and 
add a new parameter with the key
 `--additional-python-modules` and the value `apache-sedona=={{ 
sedona.current_version }}`.
 
 To confirm the installation add the follow code to the script:

Reply via email to