This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch prepare-1.4.0-doc
in repository https://gitbox.apache.org/repos/asf/sedona.git


The following commit(s) were added to refs/heads/prepare-1.4.0-doc by this push:
     new 95d4f074 Add EMR tutorial
95d4f074 is described below

commit 95d4f07475ba9220b5b0fb75784a6e26096940ef
Author: Jia Yu <[email protected]>
AuthorDate: Wed Mar 15 16:09:43 2023 -0700

    Add EMR tutorial
---
 docs/setup/emr.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 mkdocs.yml        |  1 +
 2 files changed, 49 insertions(+)

diff --git a/docs/setup/emr.md b/docs/setup/emr.md
new file mode 100644
index 00000000..f9e24ae7
--- /dev/null
+++ b/docs/setup/emr.md
@@ -0,0 +1,48 @@
+We recommend Sedona-1.3.1-incuabting and above for EMR. In the tutorial, we 
use AWS Elastic MapReduce (EMR) 6.9.0. It has the following applications 
installed: Hadoop 3.3.3, JupyterEnterpriseGateway 2.6.0, Livy 0.7.1, Spark 
3.3.0.
+
+This tutorial is tested on EMR on EC2 with EMR Studio (notebooks). EMR on EC2 
uses YARN to manage resources.
+
+## Prepare initialization script
+
+In your S3 bucket, add a script that has the following content:
+
+```bash
+#!/bin/bash
+
+# EMR clusters only have ephemeral local storage. It does not really matter 
where we store the jars.
+sudo mkdir /jars
+
+# Download Sedona jar
+sudo curl -o /jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version 
}}.jar 
"https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{
 sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar"
+
+# Download GeoTools jar
+sudo curl -o /jars/geotools-wrapper-{{ sedona.current_geotools }}.jar 
"https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ 
sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar"
+
+# Install necessary python libraries
+sudo python3 -m pip install pandas shapely==1.8.5
+sudo python3 -m pip install pandas geopandas==0.10.2
+sudo python3 -m pip install attrs matplotlib descartes apache-sedona==1.4.0
+```
+
+When you create a EMR cluster, in the `bootstrap action`, specify the location 
of this script.
+
+## Add software configuration
+
+When you create a EMR cluster, in the software configuration, add the 
following content:
+
+```bash
+[
+  {
+    "Classification":"spark-defaults", 
+    "Properties":{
+      "spark.yarn.dist.jars": "/jars/sedona-spark-shaded-3.0_2.12-{{ 
sedona.current_version }}.jar,/jars/geotools-wrapper-{{ sedona.current_geotools 
}}.jar",
+      "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
+      "spark.kryo.registrator": 
"org.apache.sedona.core.serde.SedonaKryoRegistrator",
+      "spark.sql.extensions": 
"org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
+      }
+  }
+]
+```
+
+!!!note
+       If you use Sedona 1.3.1-incubating, please use 
`sedona-python-adpater-3.0_2.12` jar in the content above, instead of 
`sedona-spark-shaded-3.0_2.12`.
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 2e41e4b2..b5981fcb 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -18,6 +18,7 @@ nav:
         - Install Sedona R: api/rdocs
         - Install Sedona-Zeppelin: setup/zeppelin.md
         - Install on Databricks: setup/databricks.md
+        - Install on AWS EMR: setup/emr.md
         - Set up Spark cluster: setup/cluster.md
       - Install with Apache Flink:
         - Install Sedona Scala/Java: setup/flink/install-scala.md

Reply via email to