Re: [PR] [DOCS] azure synapse analytics [sedona]

via GitHub Mon, 04 Nov 2024 02:06:30 -0800


golfalot commented on code in PR #1673:
URL: https://github.com/apache/sedona/pull/1673#discussion_r1827478749



##########
docs/setup/azure-synapse-analytics.md:
##########
@@ -0,0 +1,207 @@
+This tutorial will guide you through the process of installing Sedona on Azure 
Synapse Analytics when Data Exfiltration Protection (DEP) is enabled or when 
you have no internet connection from the Spark pools due to other networking 
constraints.
+
+## Strong recommendations
+1. Start with a clean Spark pool with no other packages installed to avoid 
package conflicts.
+2. Apache Spark pool -> Apache Spark configuration: Use default configuration
+
+## Sedona 1.6.1 on Spark 3.4 Python 3.10
+
+### Step1: Download packages (9)
+Caution: Precise versions are critical, latest is not always greatest here.
+
+From Maven
+
+- 
[sedona-spark-shaded-3.4_2.12-1.6.1.jar](https://mvnrepository.com/artifact/org.apache.sedona/sedona-spark-shaded-3.4_2.12/1.6.1)
+
+- 
[geotools-wrapper-1.6.1-28.2.jar](https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper/1.6.1-28.2)
+
+From PyPi
+
+- 
[rasterio-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/cd/ad/2d3a14e5a97ca827a38d4963b86071267a6cd09d45065cd753d7325699b6/rasterio-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)
+
+- 
[shapely-2.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/2b/a6/302e0d9c210ccf4d1ffadf7ab941797d3255dcd5f93daa73aaf116a4db39/shapely-2.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)
+
+- 
[apache_sedona-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/b6/71/09f7ca2b6697b2699c04d1649bb379182076d263a9849de81295d253220d/apache_sedona-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)
+
+- 
[click_plugins-1.1.1-py2.py3-none-any.whl](https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl)
+
+- 
[cligj-0.7.2-py3-none-any.whl](https://files.pythonhosted.org/packages/73/86/43fa9f15c5b9fb6e82620428827cd3c284aa933431405d1bcf5231ae3d3e/cligj-0.7.2-py3-none-any.whl)
+
+- 
[affine-2.4.0-py3-none-any.whl](https://files.pythonhosted.org/packages/0b/f7/85273299ab57117850cc0a936c64151171fac4da49bc6fba0dad984a7c5f/affine-2.4.0-py3-none-any.whl)
+
+- 
[numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/fb/25/ba023652a39a2c127200e85aed975fc6119b421e2c348e5d0171e2046edb/numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)
+
+
+### Step 2: Upload packages to Synapse Workspace 
+
+https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-workspace-packages
+
+### Step 3: Add packages to Spark Pool
+I used the second method on this page: **If you are updating from the Synapse 
Studio**
+
+https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages#manage-packages-from-synapse-studio-or-azure-portal
+
+
+### Step 4: Notebook
+Start your notebook with:
+```python
+from sedona.spark import SedonaContext
+
+config = SedonaContext.builder() \
+    .config('spark.jars.packages',
+            'org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.1,'
+            'org.datasyslab:geotools-wrapper-1.6.1-28.2') \
+    .config("spark.serializer","org.apache.spark.serializer.KryoSerializer") \
+    .config("spark.kryo.registrator", 
"org.apache.sedona.core.serde.SedonaKryoRegistrator") \
+    .config("spark.sql.extensions", 
"org.apache.sedona.viz.sql.SedonaVizExtensions") \

Review Comment:
   committed during linting edits



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DOCS] azure synapse analytics [sedona]

Reply via email to