This is an automated email from the ASF dual-hosted git repository. jiayu pushed a commit to branch fabric in repository https://gitbox.apache.org/repos/asf/sedona.git
commit a4c0edee9d040490c584e257d38e9c07072ce393 Author: Jia Yu <[email protected]> AuthorDate: Mon Apr 22 00:32:56 2024 -0700 Add Microsoft Fabric tutorial --- docs/image/fabric/fabric-1.png | Bin 0 -> 209175 bytes docs/image/fabric/fabric-2.png | Bin 0 -> 75166 bytes docs/image/fabric/fabric-3.png | Bin 0 -> 89032 bytes docs/image/fabric/fabric-4.png | Bin 0 -> 97093 bytes docs/image/fabric/fabric-5.png | Bin 0 -> 103507 bytes docs/image/fabric/fabric-6.png | Bin 0 -> 189504 bytes docs/image/fabric/fabric-7.png | Bin 0 -> 123955 bytes docs/image/fabric/fabric-8.png | Bin 0 -> 146759 bytes docs/image/fabric/fabric-9.png | Bin 0 -> 150114 bytes docs/setup/databricks.md | 3 -- docs/setup/emr.md | 3 -- docs/setup/fabric.md | 89 +++++++++++++++++++++++++++++++++++++++++ docs/setup/wherobots.md | 6 +-- mkdocs.yml | 6 ++- 14 files changed, 96 insertions(+), 11 deletions(-) diff --git a/docs/image/fabric/fabric-1.png b/docs/image/fabric/fabric-1.png new file mode 100644 index 000000000..fa00a5839 Binary files /dev/null and b/docs/image/fabric/fabric-1.png differ diff --git a/docs/image/fabric/fabric-2.png b/docs/image/fabric/fabric-2.png new file mode 100644 index 000000000..64992734b Binary files /dev/null and b/docs/image/fabric/fabric-2.png differ diff --git a/docs/image/fabric/fabric-3.png b/docs/image/fabric/fabric-3.png new file mode 100644 index 000000000..67f0dc8bb Binary files /dev/null and b/docs/image/fabric/fabric-3.png differ diff --git a/docs/image/fabric/fabric-4.png b/docs/image/fabric/fabric-4.png new file mode 100644 index 000000000..6d8b705a2 Binary files /dev/null and b/docs/image/fabric/fabric-4.png differ diff --git a/docs/image/fabric/fabric-5.png b/docs/image/fabric/fabric-5.png new file mode 100644 index 000000000..f4f3b7bc0 Binary files /dev/null and b/docs/image/fabric/fabric-5.png differ diff --git a/docs/image/fabric/fabric-6.png b/docs/image/fabric/fabric-6.png new file mode 100644 index 000000000..00b250cf2 Binary files /dev/null and b/docs/image/fabric/fabric-6.png differ diff --git a/docs/image/fabric/fabric-7.png b/docs/image/fabric/fabric-7.png new file mode 100644 index 000000000..2162e33b1 Binary files /dev/null and b/docs/image/fabric/fabric-7.png differ diff --git a/docs/image/fabric/fabric-8.png b/docs/image/fabric/fabric-8.png new file mode 100644 index 000000000..eb0ac3c2c Binary files /dev/null and b/docs/image/fabric/fabric-8.png differ diff --git a/docs/image/fabric/fabric-9.png b/docs/image/fabric/fabric-9.png new file mode 100644 index 000000000..45effa93a Binary files /dev/null and b/docs/image/fabric/fabric-9.png differ diff --git a/docs/setup/databricks.md b/docs/setup/databricks.md index ea86d37f2..1e26805f6 100644 --- a/docs/setup/databricks.md +++ b/docs/setup/databricks.md @@ -6,9 +6,6 @@ You just need to install the Sedona jars and Sedona Python on Databricks using D We recommend Databricks 10.x+. -!!!tip - Wherobots Cloud provides a free tool to deploy Apache Sedona to Databricks. Please sign up [here](https://www.wherobots.services/). - * Sedona 1.0.1 & 1.1.0 is compiled against Spark 3.1 (~ Databricks DBR 9 LTS, DBR 7 is Spark 3.0) * Sedona 1.1.1, 1.2.0 are compiled against Spark 3.2 (~ DBR 10 & 11) * Sedona 1.2.1, 1.3.1, 1.4.0 are complied against Spark 3.3 diff --git a/docs/setup/emr.md b/docs/setup/emr.md index 237dff7b4..6d687f35e 100644 --- a/docs/setup/emr.md +++ b/docs/setup/emr.md @@ -1,8 +1,5 @@ We recommend Sedona-1.3.1-incubating and above for EMR. In the tutorial, we use AWS Elastic MapReduce (EMR) 6.9.0. It has the following applications installed: Hadoop 3.3.3, JupyterEnterpriseGateway 2.6.0, Livy 0.7.1, Spark 3.3.0. -!!!tip - Wherobots Cloud provides a free tool to deploy Apache Sedona to AWS EMR. Please sign up [here](https://www.wherobots.services/). - This tutorial is tested on EMR on EC2 with EMR Studio (notebooks). EMR on EC2 uses YARN to manage resources. !!!note diff --git a/docs/setup/fabric.md b/docs/setup/fabric.md new file mode 100644 index 000000000..0326c2aaf --- /dev/null +++ b/docs/setup/fabric.md @@ -0,0 +1,89 @@ +This tutorial will guide you through the process of installing Sedona on Microsoft Fabric Synapse Data Engineering's Spark environment. + +## Step 1: Open Microsoft Fabric Synapse Data Engineering + +Go to the [Microsoft Fabric portal](https://app.fabric.microsoft.com/) and choose the `Data Engineering` option. + + + +## Step 2: Create a Microsoft Fabric Data Engineering environment + +On the left side, click `My Workspace` and then click `+ New` to create a new `Environment`. Let's name it `ApacheSedona`. + + + +## Step 3: Select the Apache Spark version + +In the `Environment` page, click the `Home` tab and select the appropriate version of Apache Spark. You will need this version to install the correct version of Apache Sedona. + + + +## Step 4: Install the Sedona Python package + +In the `Environment` page, click the `Public libraries` tab and then type in `apache-sedona`. Please select the appropriate version of Apache Sedona. The source is `PyPI`. + + + +## Step 5: Save and publish the environment + +Click the `Save` button and then click the `Publish` button to save and publish the environment. This will create the environment with the Apache Sedona Python package installed. The publishing process will take about 10 minutes. + + + +## Step 6: Download Sedona jars + +1. Learn the Sedona jars you need from our [Sedona maven coordinate](maven-coordinates.md) +2. Download the `sedona-spark-shaded` jars from [Maven Central](https://search.maven.org/search?q=g:org.apache.sedona). Please pay attention to the Spark version and Scala version of the jars. If you select Spark 3.4 in the Fabric environment, you should download the Sedona jars with Spark 3.4 and Scala 2.12 and the jar name should be like `sedona-spark-shaded-3.4_2.12-1.5.1.jar`. +3. Download the `geotools-wrapper` jars from [Maven Central](https://search.maven.org/search?q=g:org.datasyslab). Please pay attention to the Sedona verions of the jar. If you select Sedona 1.5.1, you should download the `geotools-wrapper` jar with version 1.5.1 and the jar name should be like `geotools-wrapper-1.5.1-28.2.jar`. + +## Step 7: Upload Sedona jars to the Fabric environment LakeHouse storage + +In the notebook page, choose the `Explorer` and click the `LakeHouses` option. If you don't have a LakeHouse, you can create one. Then choose `Files` and upload the 2 jars you downloaded in the previous step. + +After the upload, you should be able to see the 2 jars in the LakeHouse storage. Then please copy the `ABFS` paths of the 2 jars. In this example, the paths are + +```angular2html +abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar + +abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e15f3695-af7e-47de-979e-473c3caa9f5b/Files/geotools-wrapper-1.5.1-28.2.jar +``` + + + + + +## Step 8: Start the notebook with the Sedona environment and install the jars + +In the notebook page, select the `ApacheSedona` environment you created before. + + + +In the notebook, you can install the jars by running the following code. Please replace the `spark.jars` with the `ABFS` paths of the 2 jars you uploaded in the previous step. + +```python +%%configure -f +{ + "conf": { + "spark.jars": "abfss://XXX/Files/sedona-spark-shaded-3.4_2.12-1.5.1.jar,abfss://XXX/Files/geotools-wrapper-1.5.1-28.2.jar", + } +} +``` + +## Step 9: Verify the installation + +You can verify the installation by running the following code in the notebook. + +```python +from sedona.spark import * + + +sedona = SedonaContext.create(spark) + + +sedona.sql("SELECT ST_GeomFromEWKT('SRID=4269;POINT(40.7128 -74.0060)')").show() +``` + +If you see the output of the point, then the installation is successful. + + + diff --git a/docs/setup/wherobots.md b/docs/setup/wherobots.md index 78c38779f..1bff8d322 100644 --- a/docs/setup/wherobots.md +++ b/docs/setup/wherobots.md @@ -1,7 +1,7 @@ -## SedonaDB +## WherobotsDB -Wherobots Cloud offers fully-managed and fully provisioned cloud services for SedonaDB, a comprehensive spatial analytics database system. You can play with it using Wherobots Jupyter Scala and Python kernel. No installation is needed. +Wherobots Cloud offers fully-managed and fully provisioned cloud services for WherobotsDB, a comprehensive spatial analytics database system. You can play with it using Wherobots Jupyter Scala and Python kernel. No installation is needed. -SedonaDB is 100% compatible with Apache Sedona 1.5.0+ in terms of public APIs but provides more functionalities. +WherobotsDB is 100% compatible with Apache Sedona in terms of public APIs but provides more functionalities and better performance. It is easy to migrate your existing Sedona workflow to Wherobots Cloud. Please sign up at [Wherobots Cloud](https://www.wherobots.services/). diff --git a/mkdocs.yml b/mkdocs.yml index 14c64eaa8..2b9ea36bf 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -23,7 +23,8 @@ nav: - Install on Wherobots: setup/wherobots.md - Install on Databricks: setup/databricks.md - Install on AWS EMR: setup/emr.md - - Set up Spark cluster: setup/cluster.md + - Install on Microsfot Fabric: setup/fabric.md + - Set up Spark cluster manually: setup/cluster.md - Install with Apache Flink: - Install Sedona Scala/Java: setup/flink/install-scala.md - Install with Snowflake: @@ -196,4 +197,5 @@ plugins: - macros - git-revision-date-localized: type: datetime - - mkdocs-jupyter + - mkdocs-jupyter: + include_source: True
