This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git
The following commit(s) were added to refs/heads/master by this push:
new 6adae4004 [DOCS] Update Microsoft Fabric tutorial with Spark
properties (#1388)
6adae4004 is described below
commit 6adae40049592a2492c73a4231e01c7896174dc9
Author: Jia Yu <[email protected]>
AuthorDate: Tue Apr 30 09:39:40 2024 -0700
[DOCS] Update Microsoft Fabric tutorial with Spark properties (#1388)
* Add the spark properties
* Refactor the doc
* Update docs/setup/fabric.md
Co-authored-by: John Bampton <[email protected]>
---------
Co-authored-by: John Bampton <[email protected]>
---
docs/image/fabric/{fabric-9.png => fabric-10.png} | Bin
docs/image/fabric/fabric-5.png | Bin 103507 -> 192084 bytes
docs/image/fabric/fabric-6.png | Bin 189504 -> 103507 bytes
docs/image/fabric/fabric-7.png | Bin 123955 -> 189504 bytes
docs/image/fabric/fabric-8.png | Bin 146759 -> 123955 bytes
docs/image/fabric/fabric-9.png | Bin 150114 -> 146759 bytes
docs/setup/fabric.md | 72 +++++++++++++++-------
7 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/docs/image/fabric/fabric-9.png b/docs/image/fabric/fabric-10.png
similarity index 100%
copy from docs/image/fabric/fabric-9.png
copy to docs/image/fabric/fabric-10.png
diff --git a/docs/image/fabric/fabric-5.png b/docs/image/fabric/fabric-5.png
index f4f3b7bc0..0c1127a55 100644
Binary files a/docs/image/fabric/fabric-5.png and
b/docs/image/fabric/fabric-5.png differ
diff --git a/docs/image/fabric/fabric-6.png b/docs/image/fabric/fabric-6.png
index 00b250cf2..f4f3b7bc0 100644
Binary files a/docs/image/fabric/fabric-6.png and
b/docs/image/fabric/fabric-6.png differ
diff --git a/docs/image/fabric/fabric-7.png b/docs/image/fabric/fabric-7.png
index 2162e33b1..00b250cf2 100644
Binary files a/docs/image/fabric/fabric-7.png and
b/docs/image/fabric/fabric-7.png differ
diff --git a/docs/image/fabric/fabric-8.png b/docs/image/fabric/fabric-8.png
index eb0ac3c2c..2162e33b1 100644
Binary files a/docs/image/fabric/fabric-8.png and
b/docs/image/fabric/fabric-8.png differ
diff --git a/docs/image/fabric/fabric-9.png b/docs/image/fabric/fabric-9.png
index 45effa93a..eb0ac3c2c 100644
Binary files a/docs/image/fabric/fabric-9.png and
b/docs/image/fabric/fabric-9.png differ
diff --git a/docs/setup/fabric.md b/docs/setup/fabric.md
index 1db3bacc7..aa5ca6ee6 100644
--- a/docs/setup/fabric.md
+++ b/docs/setup/fabric.md
@@ -24,48 +24,47 @@ In the `Environment` page, click the `Public libraries` tab
and then type in `ap

-## Step 5: Save and publish the environment
+## Step 5: Set Spark properties
-Click the `Save` button and then click the `Publish` button to save and
publish the environment. This will create the environment with the Apache
Sedona Python package installed. The publishing process will take about 10
minutes.
+In the `Environment` page, click the `Spark properties` tab, then create the
following 3 properties:
+
+- `spark.sql.extensions`:
`org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions`
+- `spark.serializer`: `org.apache.spark.serializer.KryoSerializer`
+- `spark.kryo.registrator`:
`org.apache.sedona.core.serde.SedonaKryoRegistrator`

-## Step 6: Download Sedona jars
+## Step 6: Save and publish the environment
-1. Learn the Sedona jars you need from our [Sedona maven
coordinate](maven-coordinates.md)
-2. Download the `sedona-spark-shaded` jars from [Maven
Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay
attention to the Spark version and Scala version of the jars. If you select
Spark 3.4 in the Fabric environment, you should download the Sedona jars with
Spark 3.4 and Scala 2.12 and the jar name should be like
`sedona-spark-shaded-3.4_2.12-1.5.1.jar`.
-3. Download the `geotools-wrapper` jars from [Maven
Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay
attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you
should download the `geotools-wrapper` jar with version 1.5.1 and the jar name
should be like `geotools-wrapper-1.5.1-28.2.jar`.
+Click the `Save` button and then click the `Publish` button to save and
publish the environment. This will create the environment with the Apache
Sedona Python package installed. The publishing process will take about 10
minutes.
-## Step 7: Upload Sedona jars to the Fabric environment LakeHouse storage
+
-In the notebook page, choose the `Explorer` and click the `LakeHouses` option.
If you don't have a LakeHouse, you can create one. Then choose `Files` and
upload the 2 jars you downloaded in the previous step.
+## Step 7: Find the download links of Sedona jars
-After the upload, you should be able to see the 2 jars in the LakeHouse
storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the
paths are
+1. Learn the Sedona jars you need from our [Sedona maven
coordinate](maven-coordinates.md)
+2. Find the `sedona-spark-shaded` jar from [Maven
Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay
attention to the Spark version and Scala version of the jars. If you select
Spark 3.4 in the Fabric environment, you should download the Sedona jars with
Spark 3.4 and Scala 2.12 and the jar name should be like
`sedona-spark-shaded-3.4_2.12-1.5.1.jar`.
+3. Find the `geotools-wrapper` jar from [Maven
Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay
attention to the Sedona versions of the jar. If you select Sedona 1.5.1, you
should download the `geotools-wrapper` jar with version 1.5.1 and the jar name
should be like `geotools-wrapper-1.5.1-28.2.jar`.
-```angular2html
-abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar
+The download links are like:
-abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar
```
-
-
-
-
+https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar
+https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar
+```
## Step 8: Start the notebook with the Sedona environment and install the jars
In the notebook page, select the `ApacheSedona` environment you created before.
-
+
-In the notebook, you can install the jars by running the following code.
Please replace the `spark.jars` with the `ABFS` paths of the 2 jars you
uploaded in the previous step.
+In the notebook, you can install the jars by running the following code.
Please replace the `jars` with the download links of the 2 jars from the
previous step.
```python
%%configure -f
{
- "conf": {
- "spark.jars":
"abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar",
- }
+ "jars":
["https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar",
"https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar"]
}
```
@@ -85,4 +84,33 @@ sedona.sql("SELECT ST_GeomFromEWKT('SRID=4269;POINT(40.7128
-74.0060)')").show()
If you see the output of the point, then the installation is successful.
-
+
+
+## Optional: manually upload Sedona jars to the Fabric environment LakeHouse
storage
+
+If your cluster has no internet access or you want to skip the slow on-the-fly
download, you can manually upload the Sedona jars to the Fabric environment
LakeHouse storage.
+
+In the notebook page, choose the `Explorer` and click the `LakeHouses` option.
If you don't have a LakeHouse, you can create one. Then choose `Files` and
upload the 2 jars you downloaded in the previous step.
+
+After the upload, you should be able to see the 2 jars in the LakeHouse
storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the
paths are
+
+```angular2html
+abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar
+
+abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar
+```
+
+
+
+
+
+If you use this option, the config files in your notebook should be
+
+```python
+%%configure -f
+{
+ "conf": {
+ "spark.jars":
"abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar",
+ }
+}
+```