This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/sedona-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 49e849c Fix some numbers in docs
49e849c is described below
commit 49e849cf658f5eb872347ee51eaac913db3fc42e
Author: Jia Yu <[email protected]>
AuthorDate: Thu Mar 30 11:08:09 2023 -0700
Fix some numbers in docs
---
1.4.0/download/index.html | 12 +-
1.4.0/setup/maven-coordinates/index.html | 4 +-
1.4.0/setup/release-notes/index.html | 8 +-
1.4.0/tutorial/demo/index.html | 2 +-
1.4.0/tutorial/flink/sql/index.html | 2 +-
1.4.0/tutorial/geopandas-shapely/index.html | 4 +-
1.4.0/tutorial/rdd/index.html | 2 +-
1.4.0/tutorial/sql/index.html | 199 +++++++++++++++++++--
latest-snapshot/download/index.html | 12 +-
latest-snapshot/setup/maven-coordinates/index.html | 4 +-
latest-snapshot/setup/release-notes/index.html | 8 +-
latest-snapshot/tutorial/demo/index.html | 2 +-
latest-snapshot/tutorial/flink/sql/index.html | 2 +-
.../tutorial/geopandas-shapely/index.html | 4 +-
latest-snapshot/tutorial/rdd/index.html | 2 +-
latest-snapshot/tutorial/sql/index.html | 199 +++++++++++++++++++--
16 files changed, 404 insertions(+), 62 deletions(-)
diff --git a/1.4.0/download/index.html b/1.4.0/download/index.html
index 38bd63a..c24a86b 100644
--- a/1.4.0/download/index.html
+++ b/1.4.0/download/index.html
@@ -859,8 +859,8 @@
<li class="md-nav__item">
- <a href="#verify-the-integ140rity" class="md-nav__link">
- Verify the integ1.4.0rity
+ <a href="#verify-the-integrity" class="md-nav__link">
+ Verify the integrity
</a>
@@ -2367,8 +2367,8 @@
<li class="md-nav__item">
- <a href="#verify-the-integ140rity" class="md-nav__link">
- Verify the integ1.4.0rity
+ <a href="#verify-the-integrity" class="md-nav__link">
+ Verify the integrity
</a>
@@ -2453,7 +2453,7 @@
<p>Latest source code: <a href="https://github.com/apache/sedona/">GitHub
repository</a></p>
<p>Old GeoSpark releases: <a
href="https://github.com/apache/sedona/releases">GitHub releases</a></p>
<p>Automatically generated binary JARs (per each Master branch commit): <a
href="https://github.com/apache/sedona/actions/workflows/java.yml">GitHub
Action</a></p>
-<h2 id="verify-the-integ140rity">Verify the integ1.4.0rity<a
class="headerlink" href="#verify-the-integ140rity" title="Permanent
link">¶</a></h2>
+<h2 id="verify-the-integrity">Verify the integrity<a class="headerlink"
href="#verify-the-integrity" title="Permanent link">¶</a></h2>
<p><a href="https://downloads.apache.org/sedona/KEYS">Public keys</a></p>
<p><a href="https://www.apache.org/info/verification.html">Instructions</a></p>
<h2 id="versions">Versions<a class="headerlink" href="#versions"
title="Permanent link">¶</a></h2>
@@ -2517,7 +2517,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 01:53:14</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/1.4.0/setup/maven-coordinates/index.html
b/1.4.0/setup/maven-coordinates/index.html
index d29eac2..4005191 100644
--- a/1.4.0/setup/maven-coordinates/index.html
+++ b/1.4.0/setup/maven-coordinates/index.html
@@ -2690,7 +2690,7 @@
<span class="nt"></dependency></span>
<span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.apache.sedona<span
class="nt"></groupId></span>
- <span class="nt"><artifactId></span>sedona-flink-3.0_2.12<span
class="nt"></artifactId></span>
+ <span class="nt"><artifactId></span>sedona-flink_2.12<span
class="nt"></artifactId></span>
<span class="nt"><version></span>1.4.0<span
class="nt"></version></span>
<span class="nt"></dependency></span>
<span class="nt"><dependency></span>
@@ -2767,7 +2767,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 16, 2023 00:00:53</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/1.4.0/setup/release-notes/index.html
b/1.4.0/setup/release-notes/index.html
index b830419..1f5fefb 100644
--- a/1.4.0/setup/release-notes/index.html
+++ b/1.4.0/setup/release-notes/index.html
@@ -3255,7 +3255,7 @@
</div>
<div class="admonition danger">
<p class="admonition-title">Danger</p>
-<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically installe Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
+<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically install Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
</div>
<h2 id="sedona-140">Sedona 1.4.0<a class="headerlink" href="#sedona-140"
title="Permanent link">¶</a></h2>
<p>Sedona 1.4.0 is compiled against, Spark 3.3 / Flink 1.12, Java 8.</p>
@@ -3270,7 +3270,7 @@
</ul>
<h3 id="api-change">API change<a class="headerlink" href="#api-change"
title="Permanent link">¶</a></h3>
<ul>
-<li><strong>Sedona Spark & Flink</strong> Packaging strategy changed. See
<a href="../maven-coordinates">Maven Coordinate</a>. Please change your Sedona
dependencies if needed. We recommend
<code>sedona-spark-shaded-3.0_2.12-1.4.0</code> and
<code>sedona-flink-shaded-3.0_2.12-1.4.0</code></li>
+<li><strong>Sedona Spark & Flink</strong> Packaging strategy changed. See
<a href="../maven-coordinates">Maven Coordinate</a>. Please change your Sedona
dependencies if needed. We recommend
<code>sedona-spark-shaded-3.0_2.12-1.4.0</code> and
<code>sedona-flink-shaded_2.12-1.4.0</code></li>
<li><strong>Sedona Spark & Flink</strong> GeoTools-wrapper version
upgraded. Please use <code>geotools-wrapper-1.4.0-28.2</code>.</li>
</ul>
<h3 id="behavior-change">Behavior change<a class="headerlink"
href="#behavior-change" title="Permanent link">¶</a></h3>
@@ -3283,7 +3283,7 @@
</ul>
</li>
</ul>
-<p>When <code>sedona.join.optimizationmode</code> is configured as
<code>nonequi</code>, it won't optimize join queries such as <code>SELECT *
FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom, B.geom)</code>, since it is
an equi-join with equi-condition <code>A.x = B.x</code>. Sedona will optimize
for <code>SELECT * FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom,
B.geom)</code></p>
+<p>When <code>sedona.join.optimizationmode</code> is configured as
<code>nonequi</code>, it won't optimize join queries such as <code>SELECT *
FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom, B.geom)</code>, since it is
an equi-join with equi-condition <code>A.x = B.x</code>. Sedona will optimize
for <code>SELECT * FROM A, B WHERE ST_Contains(A.geom, B.geom)</code></p>
<h3 id="bug">Bug<a class="headerlink" href="#bug" title="Permanent
link">¶</a></h3>
<ul>
<li>[<a
href='https://issues.apache.org/jira/browse/SEDONA-218'>SEDONA-218</a>] -
Flaky test caused by improper handling of null struct values in Adapter.toDf
@@ -3780,7 +3780,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/1.4.0/tutorial/demo/index.html b/1.4.0/tutorial/demo/index.html
index 5ec6555..36014fa 100644
--- a/1.4.0/tutorial/demo/index.html
+++ b/1.4.0/tutorial/demo/index.html
@@ -2533,7 +2533,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 01:53:14</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/1.4.0/tutorial/flink/sql/index.html
b/1.4.0/tutorial/flink/sql/index.html
index 3e9c271..e6f2de9 100644
--- a/1.4.0/tutorial/flink/sql/index.html
+++ b/1.4.0/tutorial/flink/sql/index.html
@@ -3139,7 +3139,7 @@ FROM rights CROSS JOIN UNNEST(rights.idarray) AS
tmpTbl2(cellId)
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 02:07:22</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/1.4.0/tutorial/geopandas-shapely/index.html
b/1.4.0/tutorial/geopandas-shapely/index.html
index f842d40..3d0c585 100644
--- a/1.4.0/tutorial/geopandas-shapely/index.html
+++ b/1.4.0/tutorial/geopandas-shapely/index.html
@@ -2544,7 +2544,7 @@
<h1 id="work-with-geopandas-and-shapely">Work with GeoPandas and Shapely<a
class="headerlink" href="#work-with-geopandas-and-shapely" title="Permanent
link">¶</a></h1>
<div class="admonition danger">
<p class="admonition-title">Danger</p>
-<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically installe Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
+<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically install Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
</div>
<h2 id="interoperate-with-geopandas">Interoperate with GeoPandas<a
class="headerlink" href="#interoperate-with-geopandas" title="Permanent
link">¶</a></h2>
<p>Sedona Python has implemented serializers and deserializers which allows to
convert Sedona Geometry objects into Shapely BaseGeometry objects. Based on
that it is possible to load the data with geopandas from file (look at Fiona
possible drivers) and create Spark DataFrame based on GeoDataFrame object.</p>
@@ -2845,7 +2845,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 01:24:41</span>
</small>
diff --git a/1.4.0/tutorial/rdd/index.html b/1.4.0/tutorial/rdd/index.html
index 4f64dd0..126b7a3 100644
--- a/1.4.0/tutorial/rdd/index.html
+++ b/1.4.0/tutorial/rdd/index.html
@@ -3878,7 +3878,7 @@ Find the superheroes within 10 miles of each city</p>
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/1.4.0/tutorial/sql/index.html b/1.4.0/tutorial/sql/index.html
index c58a3a6..7e295d9 100644
--- a/1.4.0/tutorial/sql/index.html
+++ b/1.4.0/tutorial/sql/index.html
@@ -966,8 +966,17 @@
<li class="md-nav__item">
- <a href="#load-shapefile-and-geojson" class="md-nav__link">
- Load Shapefile and GeoJSON
+ <a href="#load-geojson-using-spark-json-data-source" class="md-nav__link">
+ Load GeoJSON using Spark JSON Data Source
+ </a>
+
+
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-shapefile-and-geojson-using-spatialrdd" class="md-nav__link">
+ Load Shapefile and GeoJSON using SpatialRDD
</a>
@@ -980,6 +989,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-data-from-jdbc-data-sources" class="md-nav__link">
+ Load data from JDBC data sources
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -1067,6 +1085,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#save-to-postgis" class="md-nav__link">
+ Save to Postgis
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2530,8 +2557,17 @@
<li class="md-nav__item">
- <a href="#load-shapefile-and-geojson" class="md-nav__link">
- Load Shapefile and GeoJSON
+ <a href="#load-geojson-using-spark-json-data-source" class="md-nav__link">
+ Load GeoJSON using Spark JSON Data Source
+ </a>
+
+
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-shapefile-and-geojson-using-spatialrdd" class="md-nav__link">
+ Load Shapefile and GeoJSON using SpatialRDD
</a>
@@ -2544,6 +2580,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-data-from-jdbc-data-sources" class="md-nav__link">
+ Load data from JDBC data sources
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2631,6 +2676,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#save-to-postgis" class="md-nav__link">
+ Save to Postgis
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2896,11 +2950,49 @@ The file may have many other columns.</p>
<p class="admonition-title">Note</p>
<p>SedonaSQL provides lots of functions to create a Geometry column, please
read <a href="../../api/sql/Constructor/">SedonaSQL constructor API</a>.</p>
</div>
-<h2 id="load-shapefile-and-geojson">Load Shapefile and GeoJSON<a
class="headerlink" href="#load-shapefile-and-geojson" title="Permanent
link">¶</a></h2>
-<p>Shapefile and GeoJSON must be loaded by SpatialRDD and converted to
DataFrame using Adapter. Please read <a
href="../rdd/#create-a-generic-spatialrdd">Load SpatialRDD</a> and <a
href="#convert-between-dataframe-and-spatialrdd">DataFrame <->
RDD</a>.</p>
+<h2 id="load-geojson-using-spark-json-data-source">Load GeoJSON using Spark
JSON Data Source<a class="headerlink"
href="#load-geojson-using-spark-json-data-source" title="Permanent
link">¶</a></h2>
+<p>Spark SQL's built-in JSON data source supports reading GeoJSON data.
+To ensure proper parsing of the geometry property, we can define a schema with
the geometry property set to type 'string'.
+This prevents Spark from interpreting the property and allows us to use the
ST_GeomFromGeoJSON function for accurate geometry parsing.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="6:3"><input
checked="checked" id="__tabbed_6_1" name="__tabbed_6" type="radio" /><input
id="__tabbed_6_2" name="__tabbed_6" type="radio" /><input id="__tabbed_6_3"
name="__tabbed_6" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_6_1">Scala</label><label for="__tabbed_6_2">Java</label><label
for="__tabbed_6_3">Python</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="s">"type string, crs string, totalFeatures long, features
array<struct<type string, geometry string, properties map<string,
string>>>"</span><span class="w"></span>
+<span class="n">sparkSession</span><span class="p">.</span><span
class="n">read</span><span class="p">.</span><span class="n">schema</span><span
class="p">(</span><span class="n">schema</span><span class="p">).</span><span
class="n">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">selectExpr</span><span class="p">(</span><span
class="s">"explode(features) as features"</span><span
class="p">)</span><span class="w"> </span><span class="c1">// Explode the
envelope to get one feature per row.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">select</span><span class="p">(</span><span
class="s">"features.*"</span><span class="p">)</span><span class="w">
</span><span class="c1">// Unpack the features struct.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geometry"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span><span class="w"> </span><span class="c1">// Convert the
geometry string.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">printSchema</span><span class="p">()</span><span class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span
class="n">String</span><span class="w"> </span><span
class="n">schema</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="s">"type string, crs string, totalFeatures
long, features array<struct<type string, geometry string, properties
map<string, string>>>"</span><span class="p">;</span><span
class="w"></span>
+<span class="n">sparkSession</span><span class="p">.</span><span
class="na">read</span><span class="p">.</span><span
class="na">schema</span><span class="p">(</span><span
class="n">schema</span><span class="p">).</span><span
class="na">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">selectExpr</span><span class="p">(</span><span
class="s">"explode(features) as features"</span><span
class="p">)</span><span class="w"> </span><span class="c1">// Explode the
envelope to get one feature per row.</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">select</span><span class="p">(</span><span
class="s">"features.*"</span><span class="p">)</span><span class="w">
</span><span class="c1">// Unpack the features struct.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geometry"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span><span class="w"> </span><span class="c1">// Convert the
geometry string.</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">printSchema</span><span class="p">();</span><span class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="n">schema</span>
<span class="o">=</span> <span class="s2">"type string, crs string,
totalFeatures long, features array<struct<type string, geometry string,
properties map<string, string>>>"</span><span
class="p">;</span>
+<span class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">,</span> <span
class="n">schema</span><span class="o">=</span><span
class="n">schema</span><span class="p">)</span>
+ <span class="o">.</span><span class="n">selectExpr</span><span
class="p">(</span><span class="s2">"explode(features) as
features"</span><span class="p">)</span> <span class="c1"># Explode the
envelope to get one feature per row.</span>
+ <span class="o">.</span><span class="n">select</span><span
class="p">(</span><span class="s2">"features.*"</span><span
class="p">)</span> <span class="c1"># Unpack the features struct.</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geometry"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span> <span class="c1"># Convert the geometry string.</span>
+ <span class="o">.</span><span class="n">printSchema</span><span
class="p">())</span>
+</code></pre></div>
+
+</div>
+</div>
+</div>
+<h2 id="load-shapefile-and-geojson-using-spatialrdd">Load Shapefile and
GeoJSON using SpatialRDD<a class="headerlink"
href="#load-shapefile-and-geojson-using-spatialrdd" title="Permanent
link">¶</a></h2>
+<p>Shapefile and GeoJSON can be loaded by SpatialRDD and converted to
DataFrame using Adapter. Please read <a
href="../rdd/#create-a-generic-spatialrdd">Load SpatialRDD</a> and <a
href="#convert-between-dataframe-and-spatialrdd">DataFrame <->
RDD</a>.</p>
<h2 id="load-geoparquet">Load GeoParquet<a class="headerlink"
href="#load-geoparquet" title="Permanent link">¶</a></h2>
<p>Since v<code>1.3.0</code>, Sedona natively supports loading GeoParquet
file. Sedona will infer geometry fields using the "geo" metadata in GeoParquet
files.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="6:3"><input
checked="checked" id="__tabbed_6_1" name="__tabbed_6" type="radio" /><input
id="__tabbed_6_2" name="__tabbed_6" type="radio" /><input id="__tabbed_6_3"
name="__tabbed_6" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_6_1">Scala/Java</label><label
for="__tabbed_6_2">Java</label><label for="__tabbed_6_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="7:3"><input
checked="checked" id="__tabbed_7_1" name="__tabbed_7" type="radio" /><input
id="__tabbed_7_2" name="__tabbed_7" type="radio" /><input id="__tabbed_7_3"
name="__tabbed_7" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_7_1">Scala/Java</label><label
for="__tabbed_7_2">Java</label><label for="__tabbed_7_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">df</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="n">sparkSession</span><span class="p">.</span><span
class="n">read</span><span class="p">.</span><span class="n">format</span><span
class="p">(</span><span class="s">"geoparquet"</span><span
class="p">).</span><span class="n">load</span><span class="p">(</span><span
class=" [...]
@@ -2933,6 +3025,65 @@ The file may have many other columns.</p>
</code></pre></div>
<p>Sedona supports spatial predicate push-down for GeoParquet files, please
refer to the <a href="../../api/sql/Optimizer/">SedonaSQL query optimizer</a>
documentation for details.</p>
+<h2 id="load-data-from-jdbc-data-sources">Load data from JDBC data sources<a
class="headerlink" href="#load-data-from-jdbc-data-sources" title="Permanent
link">¶</a></h2>
+<p>The 'query' option in Spark SQL's JDBC data source can be used to convert
geometry columns to a format that Sedona can interpret.
+This should work for most spatial JDBC data sources.
+For Postgis there is no need to add a query to convert geometry types since
it's already using EWKB as it's wire format.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="8:3"><input
checked="checked" id="__tabbed_8_1" name="__tabbed_8" type="radio" /><input
id="__tabbed_8_2" name="__tabbed_8" type="radio" /><input id="__tabbed_8_3"
name="__tabbed_8" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_8_1">Scala</label><label for="__tabbed_8_2">Java</label><label
for="__tabbed_8_3">Python</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1">// For any
JDBC data source, inluding Postgis.</span>
+<span class="kd">val</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="n">read</span><span class="p">.</span><span
class="n">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">option</span><span class="p">(</span><span
class="s">"query"</span><span class="p">,</span><span class="w">
</span><span class="s">"SELECT id, ST_AsBinary(geom) as geom FROM
my_table"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+
+<span class="c1">// This is a simplified version that works for Postgis.</span>
+<span class="kd">val</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="n">read</span><span class="p">.</span><span
class="n">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">option</span><span class="p">(</span><span
class="s">"dbtable"</span><span class="p">,</span><span class="w">
</span><span class="s">"my_table"</span><span class="p">)</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1">// For any
JDBC data source, inluding Postgis.</span><span class="w"></span>
+<span class="n">Dataset</span><span class="o"><</span><span
class="n">Row</span><span class="o">></span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="na">read</span><span class="p">().</span><span
class="na">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">option</span><span class="p">(</span><span
class="s">"query"</span><span class="p">,</span><span class="w">
</span><span class="s">"SELECT id, ST_AsBinary(geom) as geom FROM
my_table"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+
+<span class="c1">// This is a simplified version that works for
Postgis.</span><span class="w"></span>
+<span class="n">Dataset</span><span class="o"><</span><span
class="n">Row</span><span class="o">></span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="na">read</span><span class="p">().</span><span
class="na">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">option</span><span class="p">(</span><span
class="s">"dbtable"</span><span class="p">,</span><span class="w">
</span><span class="s">"my_table"</span><span class="p">)</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1"># For any JDBC
data source, inluding Postgis.</span>
+<span class="n">df</span> <span class="o">=</span> <span
class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">format</span><span class="p">(</span><span
class="s2">"jdbc"</span><span class="p">)</span>
+ <span class="c1"># Other options.</span>
+ <span class="o">.</span><span class="n">option</span><span
class="p">(</span><span class="s2">"query"</span><span
class="p">,</span> <span class="s2">"SELECT id, ST_AsBinary(geom) as geom
FROM my_table"</span><span class="p">)</span>
+ <span class="o">.</span><span class="n">load</span><span
class="p">()</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geom"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromWKB(geom)"</span><span class="p">)))</span>
+
+<span class="c1"># This is a simplified version that works for Postgis.</span>
+<span class="n">df</span> <span class="o">=</span> <span
class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">format</span><span class="p">(</span><span
class="s2">"jdbc"</span><span class="p">)</span>
+ <span class="c1"># Other options.</span>
+ <span class="o">.</span><span class="n">option</span><span
class="p">(</span><span class="s2">"dbtable"</span><span
class="p">,</span> <span class="s2">"my_table"</span><span
class="p">)</span>
+ <span class="o">.</span><span class="n">load</span><span
class="p">()</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geom"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromWKB(geom)"</span><span class="p">)))</span>
+</code></pre></div>
+
+</div>
+</div>
+</div>
<h2 id="transform-the-coordinate-reference-system">Transform the Coordinate
Reference System<a class="headerlink"
href="#transform-the-coordinate-reference-system" title="Permanent
link">¶</a></h2>
<p>Sedona doesn't control the coordinate unit (degree-based or meter-based) of
all geometries in a Geometry column. The unit of all related distances in
SedonaSQL is same as the unit of all geometries in a Geometry column.</p>
<p>To convert Coordinate Reference System of the Geometry column created
before, use the following code:</p>
@@ -3003,10 +3154,30 @@ FROM spatialDf
ORDER BY geohash
</code></pre></div>
+<h2 id="save-to-postgis">Save to Postgis<a class="headerlink"
href="#save-to-postgis" title="Permanent link">¶</a></h2>
+<p>Unfortunately, the Spark SQL JDBC data source doesn't support creating
geometry types in PostGIS using the 'createTableColumnTypes' option.
+Only the Spark built-in types are recognized.
+This means that you'll need to manage your PostGIS schema separately from
Spark.
+One way to do this is to create the table with the correct geometry column
before writing data to it with Spark.
+Alternatively, you can write your data to the table using Spark and then
manually alter the column to be a geometry type afterward.</p>
+<p>Postgis uses EWKB to serialize geometries.
+If you convert your geometries to EWKB format in Sedona you don't have to do
any additional conversion in Postgis.</p>
+<div class="highlight"><pre><span></span><code>my_postgis_db# create table
my_table (id int8, geom geometry);
+
+df.withColumn("geom", expr("ST_AsEWKB(geom)")
+ .write.format("jdbc")
+ .option("truncate","true") // Don't let Spark
recreate the table.
+ // Other options.
+ .save()
+
+// If you didn't create the table before writing you can change the type
afterward.
+my_postgis_db# alter table my_table alter column geom type geometry;
+</code></pre></div>
+
<h2 id="convert-between-dataframe-and-spatialrdd">Convert between DataFrame
and SpatialRDD<a class="headerlink"
href="#convert-between-dataframe-and-spatialrdd" title="Permanent
link">¶</a></h2>
<h3 id="dataframe-to-spatialrdd">DataFrame to SpatialRDD<a class="headerlink"
href="#dataframe-to-spatialrdd" title="Permanent link">¶</a></h3>
<p>Use SedonaSQL DataFrame-RDD Adapter to convert a DataFrame to an
SpatialRDD. Please read <a
href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a></p>
-<div class="tabbed-set tabbed-alternate" data-tabs="7:3"><input
checked="checked" id="__tabbed_7_1" name="__tabbed_7" type="radio" /><input
id="__tabbed_7_2" name="__tabbed_7" type="radio" /><input id="__tabbed_7_3"
name="__tabbed_7" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_7_1">Scala</label><label for="__tabbed_7_2">Java</label><label
for="__tabbed_7_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="9:3"><input
checked="checked" id="__tabbed_9_1" name="__tabbed_9" type="radio" /><input
id="__tabbed_9_2" name="__tabbed_9" type="radio" /><input id="__tabbed_9_3"
name="__tabbed_9" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_9_1">Scala</label><label for="__tabbed_9_2">Java</label><label
for="__tabbed_9_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">spatialRDD</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="nc">Adapter</span><span
class="p">.</span><span class="n">toSpatialRdd</span><span
class="p">(</span><span class="n">spatialDf</span><span class="p">,</span><span
class="w"> </span><span class="s">"usacounty"</span><span
class="p">)</span><span class="w"></span>
@@ -3034,7 +3205,7 @@ ORDER BY geohash
</div>
<h3 id="spatialrdd-to-dataframe">SpatialRDD to DataFrame<a class="headerlink"
href="#spatialrdd-to-dataframe" title="Permanent link">¶</a></h3>
<p>Use SedonaSQL DataFrame-RDD Adapter to convert a DataFrame to an
SpatialRDD. Please read <a
href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a></p>
-<div class="tabbed-set tabbed-alternate" data-tabs="8:3"><input
checked="checked" id="__tabbed_8_1" name="__tabbed_8" type="radio" /><input
id="__tabbed_8_2" name="__tabbed_8" type="radio" /><input id="__tabbed_8_3"
name="__tabbed_8" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_8_1">Scala</label><label for="__tabbed_8_2">Java</label><label
for="__tabbed_8_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="10:3"><input
checked="checked" id="__tabbed_10_1" name="__tabbed_10" type="radio" /><input
id="__tabbed_10_2" name="__tabbed_10" type="radio" /><input id="__tabbed_10_3"
name="__tabbed_10" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_10_1">Scala</label><label for="__tabbed_10_2">Java</label><label
for="__tabbed_10_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">spatialDf</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="nc">Adapter</span><span class="p">.</span><span
class="n">toDf</span><span class="p">(</span><span
class="n">spatialRDD</span><span class="p">,</span><span class="w">
</span><span class="n">sparkSession</span><span class="p">)</span><span
class="w"></span>
@@ -3060,7 +3231,7 @@ ORDER BY geohash
types. Note that string schemas and not all data types are
supported—please check the
<a href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a> to confirm what is supported for your use
case. At least one column for the user data must be provided.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="9:1"><input
checked="checked" id="__tabbed_9_1" name="__tabbed_9" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_9_1">Scala</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="11:1"><input
checked="checked" id="__tabbed_11_1" name="__tabbed_11" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_11_1">Scala</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="nc">StructType</span><span class="p">(</span><span
class="nc">Array</span><span class="p">(</span><span class="w"></span>
@@ -3077,7 +3248,7 @@ case. At least one column for the user data must be
provided.</p>
</div>
<h3 id="spatialpairrdd-to-dataframe">SpatialPairRDD to DataFrame<a
class="headerlink" href="#spatialpairrdd-to-dataframe" title="Permanent
link">¶</a></h3>
<p>PairRDD is the result of a spatial join query or distance join query.
SedonaSQL DataFrame-RDD Adapter can convert the result to a DataFrame. But you
need to provide the name of other attributes.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="10:3"><input
checked="checked" id="__tabbed_10_1" name="__tabbed_10" type="radio" /><input
id="__tabbed_10_2" name="__tabbed_10" type="radio" /><input id="__tabbed_10_3"
name="__tabbed_10" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_10_1">Scala</label><label for="__tabbed_10_2">Java</label><label
for="__tabbed_10_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="12:3"><input
checked="checked" id="__tabbed_12_1" name="__tabbed_12" type="radio" /><input
id="__tabbed_12_2" name="__tabbed_12" type="radio" /><input id="__tabbed_12_3"
name="__tabbed_12" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_12_1">Scala</label><label for="__tabbed_12_2">Java</label><label
for="__tabbed_12_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">joinResultDf</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="nc">Adapter</span><span
class="p">.</span><span class="n">toDf</span><span class="p">(</span><span
class="n">joinResultPairRDD</span><span class="p">,</span><span class="w">
</span><span class="nc">Seq</span><span class="p">(</span><span
class="s">"left_attribute1" [...]
@@ -3103,7 +3274,7 @@ case. At least one column for the user data must be
provided.</p>
</div>
</div>
<p>or you can use the attribute names directly from the input RDD</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="11:3"><input
checked="checked" id="__tabbed_11_1" name="__tabbed_11" type="radio" /><input
id="__tabbed_11_2" name="__tabbed_11" type="radio" /><input id="__tabbed_11_3"
name="__tabbed_11" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_11_1">Scala</label><label for="__tabbed_11_2">Java</label><label
for="__tabbed_11_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="13:3"><input
checked="checked" id="__tabbed_13_1" name="__tabbed_13" type="radio" /><input
id="__tabbed_13_2" name="__tabbed_13" type="radio" /><input id="__tabbed_13_3"
name="__tabbed_13" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_13_1">Scala</label><label for="__tabbed_13_2">Java</label><label
for="__tabbed_13_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="k">import</span><span class="w"> </span><span
class="nn">scala</span><span class="p">.</span><span
class="nn">collection</span><span class="p">.</span><span
class="nc">JavaConversions</span><span class="p">.</span><span
class="n">_</span>
@@ -3131,7 +3302,7 @@ case. At least one column for the user data must be
provided.</p>
types. Note that string schemas and not all data types are
supported—please check the
<a href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a> to confirm what is supported for your use
case. Columns for the left and right user data must be provided.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="12:1"><input
checked="checked" id="__tabbed_12_1" name="__tabbed_12" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_12_1">Scala</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="14:1"><input
checked="checked" id="__tabbed_14_1" name="__tabbed_14" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_14_1">Scala</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="nc">StructType</span><span class="p">(</span><span
class="nc">Array</span><span class="p">(</span><span class="w"></span>
@@ -3154,7 +3325,7 @@ case. Columns for the left and right user data must be
provided.</p>
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 16, 2023 00:00:53</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 27, 2023 08:02:49</span>
</small>
diff --git a/latest-snapshot/download/index.html
b/latest-snapshot/download/index.html
index 38bd63a..c24a86b 100644
--- a/latest-snapshot/download/index.html
+++ b/latest-snapshot/download/index.html
@@ -859,8 +859,8 @@
<li class="md-nav__item">
- <a href="#verify-the-integ140rity" class="md-nav__link">
- Verify the integ1.4.0rity
+ <a href="#verify-the-integrity" class="md-nav__link">
+ Verify the integrity
</a>
@@ -2367,8 +2367,8 @@
<li class="md-nav__item">
- <a href="#verify-the-integ140rity" class="md-nav__link">
- Verify the integ1.4.0rity
+ <a href="#verify-the-integrity" class="md-nav__link">
+ Verify the integrity
</a>
@@ -2453,7 +2453,7 @@
<p>Latest source code: <a href="https://github.com/apache/sedona/">GitHub
repository</a></p>
<p>Old GeoSpark releases: <a
href="https://github.com/apache/sedona/releases">GitHub releases</a></p>
<p>Automatically generated binary JARs (per each Master branch commit): <a
href="https://github.com/apache/sedona/actions/workflows/java.yml">GitHub
Action</a></p>
-<h2 id="verify-the-integ140rity">Verify the integ1.4.0rity<a
class="headerlink" href="#verify-the-integ140rity" title="Permanent
link">¶</a></h2>
+<h2 id="verify-the-integrity">Verify the integrity<a class="headerlink"
href="#verify-the-integrity" title="Permanent link">¶</a></h2>
<p><a href="https://downloads.apache.org/sedona/KEYS">Public keys</a></p>
<p><a href="https://www.apache.org/info/verification.html">Instructions</a></p>
<h2 id="versions">Versions<a class="headerlink" href="#versions"
title="Permanent link">¶</a></h2>
@@ -2517,7 +2517,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 01:53:14</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/latest-snapshot/setup/maven-coordinates/index.html
b/latest-snapshot/setup/maven-coordinates/index.html
index d29eac2..4005191 100644
--- a/latest-snapshot/setup/maven-coordinates/index.html
+++ b/latest-snapshot/setup/maven-coordinates/index.html
@@ -2690,7 +2690,7 @@
<span class="nt"></dependency></span>
<span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.apache.sedona<span
class="nt"></groupId></span>
- <span class="nt"><artifactId></span>sedona-flink-3.0_2.12<span
class="nt"></artifactId></span>
+ <span class="nt"><artifactId></span>sedona-flink_2.12<span
class="nt"></artifactId></span>
<span class="nt"><version></span>1.4.0<span
class="nt"></version></span>
<span class="nt"></dependency></span>
<span class="nt"><dependency></span>
@@ -2767,7 +2767,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 16, 2023 00:00:53</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/latest-snapshot/setup/release-notes/index.html
b/latest-snapshot/setup/release-notes/index.html
index b830419..1f5fefb 100644
--- a/latest-snapshot/setup/release-notes/index.html
+++ b/latest-snapshot/setup/release-notes/index.html
@@ -3255,7 +3255,7 @@
</div>
<div class="admonition danger">
<p class="admonition-title">Danger</p>
-<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically installe Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
+<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically install Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
</div>
<h2 id="sedona-140">Sedona 1.4.0<a class="headerlink" href="#sedona-140"
title="Permanent link">¶</a></h2>
<p>Sedona 1.4.0 is compiled against, Spark 3.3 / Flink 1.12, Java 8.</p>
@@ -3270,7 +3270,7 @@
</ul>
<h3 id="api-change">API change<a class="headerlink" href="#api-change"
title="Permanent link">¶</a></h3>
<ul>
-<li><strong>Sedona Spark & Flink</strong> Packaging strategy changed. See
<a href="../maven-coordinates">Maven Coordinate</a>. Please change your Sedona
dependencies if needed. We recommend
<code>sedona-spark-shaded-3.0_2.12-1.4.0</code> and
<code>sedona-flink-shaded-3.0_2.12-1.4.0</code></li>
+<li><strong>Sedona Spark & Flink</strong> Packaging strategy changed. See
<a href="../maven-coordinates">Maven Coordinate</a>. Please change your Sedona
dependencies if needed. We recommend
<code>sedona-spark-shaded-3.0_2.12-1.4.0</code> and
<code>sedona-flink-shaded_2.12-1.4.0</code></li>
<li><strong>Sedona Spark & Flink</strong> GeoTools-wrapper version
upgraded. Please use <code>geotools-wrapper-1.4.0-28.2</code>.</li>
</ul>
<h3 id="behavior-change">Behavior change<a class="headerlink"
href="#behavior-change" title="Permanent link">¶</a></h3>
@@ -3283,7 +3283,7 @@
</ul>
</li>
</ul>
-<p>When <code>sedona.join.optimizationmode</code> is configured as
<code>nonequi</code>, it won't optimize join queries such as <code>SELECT *
FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom, B.geom)</code>, since it is
an equi-join with equi-condition <code>A.x = B.x</code>. Sedona will optimize
for <code>SELECT * FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom,
B.geom)</code></p>
+<p>When <code>sedona.join.optimizationmode</code> is configured as
<code>nonequi</code>, it won't optimize join queries such as <code>SELECT *
FROM A, B WHERE A.x = B.x AND ST_Contains(A.geom, B.geom)</code>, since it is
an equi-join with equi-condition <code>A.x = B.x</code>. Sedona will optimize
for <code>SELECT * FROM A, B WHERE ST_Contains(A.geom, B.geom)</code></p>
<h3 id="bug">Bug<a class="headerlink" href="#bug" title="Permanent
link">¶</a></h3>
<ul>
<li>[<a
href='https://issues.apache.org/jira/browse/SEDONA-218'>SEDONA-218</a>] -
Flaky test caused by improper handling of null struct values in Adapter.toDf
@@ -3780,7 +3780,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 03:12:18</span>
</small>
diff --git a/latest-snapshot/tutorial/demo/index.html
b/latest-snapshot/tutorial/demo/index.html
index 5ec6555..36014fa 100644
--- a/latest-snapshot/tutorial/demo/index.html
+++ b/latest-snapshot/tutorial/demo/index.html
@@ -2533,7 +2533,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 01:53:14</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/latest-snapshot/tutorial/flink/sql/index.html
b/latest-snapshot/tutorial/flink/sql/index.html
index 3e9c271..e6f2de9 100644
--- a/latest-snapshot/tutorial/flink/sql/index.html
+++ b/latest-snapshot/tutorial/flink/sql/index.html
@@ -3139,7 +3139,7 @@ FROM rights CROSS JOIN UNNEST(rights.idarray) AS
tmpTbl2(cellId)
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 02:07:22</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/latest-snapshot/tutorial/geopandas-shapely/index.html
b/latest-snapshot/tutorial/geopandas-shapely/index.html
index f842d40..3d0c585 100644
--- a/latest-snapshot/tutorial/geopandas-shapely/index.html
+++ b/latest-snapshot/tutorial/geopandas-shapely/index.html
@@ -2544,7 +2544,7 @@
<h1 id="work-with-geopandas-and-shapely">Work with GeoPandas and Shapely<a
class="headerlink" href="#work-with-geopandas-and-shapely" title="Permanent
link">¶</a></h1>
<div class="admonition danger">
<p class="admonition-title">Danger</p>
-<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically installe Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
+<p>Sedona Python currently only works with Shapely 1.x. If you use GeoPandas,
please use <= GeoPandas <code>0.11.1</code>. GeoPandas > 0.11.1 will
automatically install Shapely 2.0. If you use Shapely, please use <=
<code>1.8.4</code>.</p>
</div>
<h2 id="interoperate-with-geopandas">Interoperate with GeoPandas<a
class="headerlink" href="#interoperate-with-geopandas" title="Permanent
link">¶</a></h2>
<p>Sedona Python has implemented serializers and deserializers which allows to
convert Sedona Geometry objects into Shapely BaseGeometry objects. Based on
that it is possible to load the data with geopandas from file (look at Fiona
possible drivers) and create Spark DataFrame based on GeoDataFrame object.</p>
@@ -2845,7 +2845,7 @@
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 21, 2023 01:24:41</span>
</small>
diff --git a/latest-snapshot/tutorial/rdd/index.html
b/latest-snapshot/tutorial/rdd/index.html
index 4f64dd0..126b7a3 100644
--- a/latest-snapshot/tutorial/rdd/index.html
+++ b/latest-snapshot/tutorial/rdd/index.html
@@ -3878,7 +3878,7 @@ Find the superheroes within 10 miles of each city</p>
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 19, 2023 23:59:09</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 20, 2023 03:44:37</span>
</small>
diff --git a/latest-snapshot/tutorial/sql/index.html
b/latest-snapshot/tutorial/sql/index.html
index c58a3a6..7e295d9 100644
--- a/latest-snapshot/tutorial/sql/index.html
+++ b/latest-snapshot/tutorial/sql/index.html
@@ -966,8 +966,17 @@
<li class="md-nav__item">
- <a href="#load-shapefile-and-geojson" class="md-nav__link">
- Load Shapefile and GeoJSON
+ <a href="#load-geojson-using-spark-json-data-source" class="md-nav__link">
+ Load GeoJSON using Spark JSON Data Source
+ </a>
+
+
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-shapefile-and-geojson-using-spatialrdd" class="md-nav__link">
+ Load Shapefile and GeoJSON using SpatialRDD
</a>
@@ -980,6 +989,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-data-from-jdbc-data-sources" class="md-nav__link">
+ Load data from JDBC data sources
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -1067,6 +1085,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#save-to-postgis" class="md-nav__link">
+ Save to Postgis
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2530,8 +2557,17 @@
<li class="md-nav__item">
- <a href="#load-shapefile-and-geojson" class="md-nav__link">
- Load Shapefile and GeoJSON
+ <a href="#load-geojson-using-spark-json-data-source" class="md-nav__link">
+ Load GeoJSON using Spark JSON Data Source
+ </a>
+
+
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-shapefile-and-geojson-using-spatialrdd" class="md-nav__link">
+ Load Shapefile and GeoJSON using SpatialRDD
</a>
@@ -2544,6 +2580,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#load-data-from-jdbc-data-sources" class="md-nav__link">
+ Load data from JDBC data sources
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2631,6 +2676,15 @@
</a>
+</li>
+
+ <li class="md-nav__item">
+
+ <a href="#save-to-postgis" class="md-nav__link">
+ Save to Postgis
+ </a>
+
+
</li>
<li class="md-nav__item">
@@ -2896,11 +2950,49 @@ The file may have many other columns.</p>
<p class="admonition-title">Note</p>
<p>SedonaSQL provides lots of functions to create a Geometry column, please
read <a href="../../api/sql/Constructor/">SedonaSQL constructor API</a>.</p>
</div>
-<h2 id="load-shapefile-and-geojson">Load Shapefile and GeoJSON<a
class="headerlink" href="#load-shapefile-and-geojson" title="Permanent
link">¶</a></h2>
-<p>Shapefile and GeoJSON must be loaded by SpatialRDD and converted to
DataFrame using Adapter. Please read <a
href="../rdd/#create-a-generic-spatialrdd">Load SpatialRDD</a> and <a
href="#convert-between-dataframe-and-spatialrdd">DataFrame <->
RDD</a>.</p>
+<h2 id="load-geojson-using-spark-json-data-source">Load GeoJSON using Spark
JSON Data Source<a class="headerlink"
href="#load-geojson-using-spark-json-data-source" title="Permanent
link">¶</a></h2>
+<p>Spark SQL's built-in JSON data source supports reading GeoJSON data.
+To ensure proper parsing of the geometry property, we can define a schema with
the geometry property set to type 'string'.
+This prevents Spark from interpreting the property and allows us to use the
ST_GeomFromGeoJSON function for accurate geometry parsing.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="6:3"><input
checked="checked" id="__tabbed_6_1" name="__tabbed_6" type="radio" /><input
id="__tabbed_6_2" name="__tabbed_6" type="radio" /><input id="__tabbed_6_3"
name="__tabbed_6" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_6_1">Scala</label><label for="__tabbed_6_2">Java</label><label
for="__tabbed_6_3">Python</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="s">"type string, crs string, totalFeatures long, features
array<struct<type string, geometry string, properties map<string,
string>>>"</span><span class="w"></span>
+<span class="n">sparkSession</span><span class="p">.</span><span
class="n">read</span><span class="p">.</span><span class="n">schema</span><span
class="p">(</span><span class="n">schema</span><span class="p">).</span><span
class="n">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">selectExpr</span><span class="p">(</span><span
class="s">"explode(features) as features"</span><span
class="p">)</span><span class="w"> </span><span class="c1">// Explode the
envelope to get one feature per row.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">select</span><span class="p">(</span><span
class="s">"features.*"</span><span class="p">)</span><span class="w">
</span><span class="c1">// Unpack the features struct.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geometry"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span><span class="w"> </span><span class="c1">// Convert the
geometry string.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">printSchema</span><span class="p">()</span><span class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span
class="n">String</span><span class="w"> </span><span
class="n">schema</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="s">"type string, crs string, totalFeatures
long, features array<struct<type string, geometry string, properties
map<string, string>>>"</span><span class="p">;</span><span
class="w"></span>
+<span class="n">sparkSession</span><span class="p">.</span><span
class="na">read</span><span class="p">.</span><span
class="na">schema</span><span class="p">(</span><span
class="n">schema</span><span class="p">).</span><span
class="na">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">selectExpr</span><span class="p">(</span><span
class="s">"explode(features) as features"</span><span
class="p">)</span><span class="w"> </span><span class="c1">// Explode the
envelope to get one feature per row.</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">select</span><span class="p">(</span><span
class="s">"features.*"</span><span class="p">)</span><span class="w">
</span><span class="c1">// Unpack the features struct.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geometry"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span><span class="w"> </span><span class="c1">// Convert the
geometry string.</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">printSchema</span><span class="p">();</span><span class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="n">schema</span>
<span class="o">=</span> <span class="s2">"type string, crs string,
totalFeatures long, features array<struct<type string, geometry string,
properties map<string, string>>>"</span><span
class="p">;</span>
+<span class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">json</span><span class="p">(</span><span
class="n">geojson_path</span><span class="p">,</span> <span
class="n">schema</span><span class="o">=</span><span
class="n">schema</span><span class="p">)</span>
+ <span class="o">.</span><span class="n">selectExpr</span><span
class="p">(</span><span class="s2">"explode(features) as
features"</span><span class="p">)</span> <span class="c1"># Explode the
envelope to get one feature per row.</span>
+ <span class="o">.</span><span class="n">select</span><span
class="p">(</span><span class="s2">"features.*"</span><span
class="p">)</span> <span class="c1"># Unpack the features struct.</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geometry"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromGeoJSON(geometry)"</span><span
class="p">))</span> <span class="c1"># Convert the geometry string.</span>
+ <span class="o">.</span><span class="n">printSchema</span><span
class="p">())</span>
+</code></pre></div>
+
+</div>
+</div>
+</div>
+<h2 id="load-shapefile-and-geojson-using-spatialrdd">Load Shapefile and
GeoJSON using SpatialRDD<a class="headerlink"
href="#load-shapefile-and-geojson-using-spatialrdd" title="Permanent
link">¶</a></h2>
+<p>Shapefile and GeoJSON can be loaded by SpatialRDD and converted to
DataFrame using Adapter. Please read <a
href="../rdd/#create-a-generic-spatialrdd">Load SpatialRDD</a> and <a
href="#convert-between-dataframe-and-spatialrdd">DataFrame <->
RDD</a>.</p>
<h2 id="load-geoparquet">Load GeoParquet<a class="headerlink"
href="#load-geoparquet" title="Permanent link">¶</a></h2>
<p>Since v<code>1.3.0</code>, Sedona natively supports loading GeoParquet
file. Sedona will infer geometry fields using the "geo" metadata in GeoParquet
files.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="6:3"><input
checked="checked" id="__tabbed_6_1" name="__tabbed_6" type="radio" /><input
id="__tabbed_6_2" name="__tabbed_6" type="radio" /><input id="__tabbed_6_3"
name="__tabbed_6" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_6_1">Scala/Java</label><label
for="__tabbed_6_2">Java</label><label for="__tabbed_6_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="7:3"><input
checked="checked" id="__tabbed_7_1" name="__tabbed_7" type="radio" /><input
id="__tabbed_7_2" name="__tabbed_7" type="radio" /><input id="__tabbed_7_3"
name="__tabbed_7" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_7_1">Scala/Java</label><label
for="__tabbed_7_2">Java</label><label for="__tabbed_7_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">df</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="n">sparkSession</span><span class="p">.</span><span
class="n">read</span><span class="p">.</span><span class="n">format</span><span
class="p">(</span><span class="s">"geoparquet"</span><span
class="p">).</span><span class="n">load</span><span class="p">(</span><span
class=" [...]
@@ -2933,6 +3025,65 @@ The file may have many other columns.</p>
</code></pre></div>
<p>Sedona supports spatial predicate push-down for GeoParquet files, please
refer to the <a href="../../api/sql/Optimizer/">SedonaSQL query optimizer</a>
documentation for details.</p>
+<h2 id="load-data-from-jdbc-data-sources">Load data from JDBC data sources<a
class="headerlink" href="#load-data-from-jdbc-data-sources" title="Permanent
link">¶</a></h2>
+<p>The 'query' option in Spark SQL's JDBC data source can be used to convert
geometry columns to a format that Sedona can interpret.
+This should work for most spatial JDBC data sources.
+For Postgis there is no need to add a query to convert geometry types since
it's already using EWKB as it's wire format.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="8:3"><input
checked="checked" id="__tabbed_8_1" name="__tabbed_8" type="radio" /><input
id="__tabbed_8_2" name="__tabbed_8" type="radio" /><input id="__tabbed_8_3"
name="__tabbed_8" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_8_1">Scala</label><label for="__tabbed_8_2">Java</label><label
for="__tabbed_8_3">Python</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1">// For any
JDBC data source, inluding Postgis.</span>
+<span class="kd">val</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="n">read</span><span class="p">.</span><span
class="n">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">option</span><span class="p">(</span><span
class="s">"query"</span><span class="p">,</span><span class="w">
</span><span class="s">"SELECT id, ST_AsBinary(geom) as geom FROM
my_table"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+
+<span class="c1">// This is a simplified version that works for Postgis.</span>
+<span class="kd">val</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="n">read</span><span class="p">.</span><span
class="n">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span>
+<span class="w"> </span><span class="p">.</span><span
class="n">option</span><span class="p">(</span><span
class="s">"dbtable"</span><span class="p">,</span><span class="w">
</span><span class="s">"my_table"</span><span class="p">)</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="n">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1">// For any
JDBC data source, inluding Postgis.</span><span class="w"></span>
+<span class="n">Dataset</span><span class="o"><</span><span
class="n">Row</span><span class="o">></span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="na">read</span><span class="p">().</span><span
class="na">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">option</span><span class="p">(</span><span
class="s">"query"</span><span class="p">,</span><span class="w">
</span><span class="s">"SELECT id, ST_AsBinary(geom) as geom FROM
my_table"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+
+<span class="c1">// This is a simplified version that works for
Postgis.</span><span class="w"></span>
+<span class="n">Dataset</span><span class="o"><</span><span
class="n">Row</span><span class="o">></span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">sparkSession</span><span
class="p">.</span><span class="na">read</span><span class="p">().</span><span
class="na">format</span><span class="p">(</span><span
class="s">"jdbc"</span><span class="p">)</span><span class="w"></span>
+<span class="w"> </span><span class="c1">// Other options.</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">option</span><span class="p">(</span><span
class="s">"dbtable"</span><span class="p">,</span><span class="w">
</span><span class="s">"my_table"</span><span class="p">)</span><span
class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">load</span><span class="p">()</span><span class="w"></span>
+<span class="w"> </span><span class="p">.</span><span
class="na">withColumn</span><span class="p">(</span><span
class="s">"geom"</span><span class="p">,</span><span class="w">
</span><span class="n">expr</span><span class="p">(</span><span
class="s">"ST_GeomFromWKB(geom)"</span><span class="p">))</span><span
class="w"></span>
+</code></pre></div>
+
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code><span class="c1"># For any JDBC
data source, inluding Postgis.</span>
+<span class="n">df</span> <span class="o">=</span> <span
class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">format</span><span class="p">(</span><span
class="s2">"jdbc"</span><span class="p">)</span>
+ <span class="c1"># Other options.</span>
+ <span class="o">.</span><span class="n">option</span><span
class="p">(</span><span class="s2">"query"</span><span
class="p">,</span> <span class="s2">"SELECT id, ST_AsBinary(geom) as geom
FROM my_table"</span><span class="p">)</span>
+ <span class="o">.</span><span class="n">load</span><span
class="p">()</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geom"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromWKB(geom)"</span><span class="p">)))</span>
+
+<span class="c1"># This is a simplified version that works for Postgis.</span>
+<span class="n">df</span> <span class="o">=</span> <span
class="p">(</span><span class="n">sparkSession</span><span
class="o">.</span><span class="n">read</span><span class="o">.</span><span
class="n">format</span><span class="p">(</span><span
class="s2">"jdbc"</span><span class="p">)</span>
+ <span class="c1"># Other options.</span>
+ <span class="o">.</span><span class="n">option</span><span
class="p">(</span><span class="s2">"dbtable"</span><span
class="p">,</span> <span class="s2">"my_table"</span><span
class="p">)</span>
+ <span class="o">.</span><span class="n">load</span><span
class="p">()</span>
+ <span class="o">.</span><span class="n">withColumn</span><span
class="p">(</span><span class="s2">"geom"</span><span
class="p">,</span> <span class="n">f</span><span class="o">.</span><span
class="n">expr</span><span class="p">(</span><span
class="s2">"ST_GeomFromWKB(geom)"</span><span class="p">)))</span>
+</code></pre></div>
+
+</div>
+</div>
+</div>
<h2 id="transform-the-coordinate-reference-system">Transform the Coordinate
Reference System<a class="headerlink"
href="#transform-the-coordinate-reference-system" title="Permanent
link">¶</a></h2>
<p>Sedona doesn't control the coordinate unit (degree-based or meter-based) of
all geometries in a Geometry column. The unit of all related distances in
SedonaSQL is same as the unit of all geometries in a Geometry column.</p>
<p>To convert Coordinate Reference System of the Geometry column created
before, use the following code:</p>
@@ -3003,10 +3154,30 @@ FROM spatialDf
ORDER BY geohash
</code></pre></div>
+<h2 id="save-to-postgis">Save to Postgis<a class="headerlink"
href="#save-to-postgis" title="Permanent link">¶</a></h2>
+<p>Unfortunately, the Spark SQL JDBC data source doesn't support creating
geometry types in PostGIS using the 'createTableColumnTypes' option.
+Only the Spark built-in types are recognized.
+This means that you'll need to manage your PostGIS schema separately from
Spark.
+One way to do this is to create the table with the correct geometry column
before writing data to it with Spark.
+Alternatively, you can write your data to the table using Spark and then
manually alter the column to be a geometry type afterward.</p>
+<p>Postgis uses EWKB to serialize geometries.
+If you convert your geometries to EWKB format in Sedona you don't have to do
any additional conversion in Postgis.</p>
+<div class="highlight"><pre><span></span><code>my_postgis_db# create table
my_table (id int8, geom geometry);
+
+df.withColumn("geom", expr("ST_AsEWKB(geom)")
+ .write.format("jdbc")
+ .option("truncate","true") // Don't let Spark
recreate the table.
+ // Other options.
+ .save()
+
+// If you didn't create the table before writing you can change the type
afterward.
+my_postgis_db# alter table my_table alter column geom type geometry;
+</code></pre></div>
+
<h2 id="convert-between-dataframe-and-spatialrdd">Convert between DataFrame
and SpatialRDD<a class="headerlink"
href="#convert-between-dataframe-and-spatialrdd" title="Permanent
link">¶</a></h2>
<h3 id="dataframe-to-spatialrdd">DataFrame to SpatialRDD<a class="headerlink"
href="#dataframe-to-spatialrdd" title="Permanent link">¶</a></h3>
<p>Use SedonaSQL DataFrame-RDD Adapter to convert a DataFrame to an
SpatialRDD. Please read <a
href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a></p>
-<div class="tabbed-set tabbed-alternate" data-tabs="7:3"><input
checked="checked" id="__tabbed_7_1" name="__tabbed_7" type="radio" /><input
id="__tabbed_7_2" name="__tabbed_7" type="radio" /><input id="__tabbed_7_3"
name="__tabbed_7" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_7_1">Scala</label><label for="__tabbed_7_2">Java</label><label
for="__tabbed_7_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="9:3"><input
checked="checked" id="__tabbed_9_1" name="__tabbed_9" type="radio" /><input
id="__tabbed_9_2" name="__tabbed_9" type="radio" /><input id="__tabbed_9_3"
name="__tabbed_9" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_9_1">Scala</label><label for="__tabbed_9_2">Java</label><label
for="__tabbed_9_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">spatialRDD</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="nc">Adapter</span><span
class="p">.</span><span class="n">toSpatialRdd</span><span
class="p">(</span><span class="n">spatialDf</span><span class="p">,</span><span
class="w"> </span><span class="s">"usacounty"</span><span
class="p">)</span><span class="w"></span>
@@ -3034,7 +3205,7 @@ ORDER BY geohash
</div>
<h3 id="spatialrdd-to-dataframe">SpatialRDD to DataFrame<a class="headerlink"
href="#spatialrdd-to-dataframe" title="Permanent link">¶</a></h3>
<p>Use SedonaSQL DataFrame-RDD Adapter to convert a DataFrame to an
SpatialRDD. Please read <a
href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a></p>
-<div class="tabbed-set tabbed-alternate" data-tabs="8:3"><input
checked="checked" id="__tabbed_8_1" name="__tabbed_8" type="radio" /><input
id="__tabbed_8_2" name="__tabbed_8" type="radio" /><input id="__tabbed_8_3"
name="__tabbed_8" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_8_1">Scala</label><label for="__tabbed_8_2">Java</label><label
for="__tabbed_8_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="10:3"><input
checked="checked" id="__tabbed_10_1" name="__tabbed_10" type="radio" /><input
id="__tabbed_10_2" name="__tabbed_10" type="radio" /><input id="__tabbed_10_3"
name="__tabbed_10" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_10_1">Scala</label><label for="__tabbed_10_2">Java</label><label
for="__tabbed_10_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">spatialDf</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="nc">Adapter</span><span class="p">.</span><span
class="n">toDf</span><span class="p">(</span><span
class="n">spatialRDD</span><span class="p">,</span><span class="w">
</span><span class="n">sparkSession</span><span class="p">)</span><span
class="w"></span>
@@ -3060,7 +3231,7 @@ ORDER BY geohash
types. Note that string schemas and not all data types are
supported—please check the
<a href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a> to confirm what is supported for your use
case. At least one column for the user data must be provided.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="9:1"><input
checked="checked" id="__tabbed_9_1" name="__tabbed_9" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_9_1">Scala</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="11:1"><input
checked="checked" id="__tabbed_11_1" name="__tabbed_11" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_11_1">Scala</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="nc">StructType</span><span class="p">(</span><span
class="nc">Array</span><span class="p">(</span><span class="w"></span>
@@ -3077,7 +3248,7 @@ case. At least one column for the user data must be
provided.</p>
</div>
<h3 id="spatialpairrdd-to-dataframe">SpatialPairRDD to DataFrame<a
class="headerlink" href="#spatialpairrdd-to-dataframe" title="Permanent
link">¶</a></h3>
<p>PairRDD is the result of a spatial join query or distance join query.
SedonaSQL DataFrame-RDD Adapter can convert the result to a DataFrame. But you
need to provide the name of other attributes.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="10:3"><input
checked="checked" id="__tabbed_10_1" name="__tabbed_10" type="radio" /><input
id="__tabbed_10_2" name="__tabbed_10" type="radio" /><input id="__tabbed_10_3"
name="__tabbed_10" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_10_1">Scala</label><label for="__tabbed_10_2">Java</label><label
for="__tabbed_10_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="12:3"><input
checked="checked" id="__tabbed_12_1" name="__tabbed_12" type="radio" /><input
id="__tabbed_12_2" name="__tabbed_12" type="radio" /><input id="__tabbed_12_3"
name="__tabbed_12" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_12_1">Scala</label><label for="__tabbed_12_2">Java</label><label
for="__tabbed_12_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">var</span><span class="w"> </span><span
class="n">joinResultDf</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="nc">Adapter</span><span
class="p">.</span><span class="n">toDf</span><span class="p">(</span><span
class="n">joinResultPairRDD</span><span class="p">,</span><span class="w">
</span><span class="nc">Seq</span><span class="p">(</span><span
class="s">"left_attribute1" [...]
@@ -3103,7 +3274,7 @@ case. At least one column for the user data must be
provided.</p>
</div>
</div>
<p>or you can use the attribute names directly from the input RDD</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="11:3"><input
checked="checked" id="__tabbed_11_1" name="__tabbed_11" type="radio" /><input
id="__tabbed_11_2" name="__tabbed_11" type="radio" /><input id="__tabbed_11_3"
name="__tabbed_11" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_11_1">Scala</label><label for="__tabbed_11_2">Java</label><label
for="__tabbed_11_3">Python</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="13:3"><input
checked="checked" id="__tabbed_13_1" name="__tabbed_13" type="radio" /><input
id="__tabbed_13_2" name="__tabbed_13" type="radio" /><input id="__tabbed_13_3"
name="__tabbed_13" type="radio" /><div class="tabbed-labels"><label
for="__tabbed_13_1">Scala</label><label for="__tabbed_13_2">Java</label><label
for="__tabbed_13_3">Python</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="k">import</span><span class="w"> </span><span
class="nn">scala</span><span class="p">.</span><span
class="nn">collection</span><span class="p">.</span><span
class="nc">JavaConversions</span><span class="p">.</span><span
class="n">_</span>
@@ -3131,7 +3302,7 @@ case. At least one column for the user data must be
provided.</p>
types. Note that string schemas and not all data types are
supported—please check the
<a href="../../api/javadoc/sql/org/apache/sedona/sql/utils/index.html">Adapter
Scaladoc</a> to confirm what is supported for your use
case. Columns for the left and right user data must be provided.</p>
-<div class="tabbed-set tabbed-alternate" data-tabs="12:1"><input
checked="checked" id="__tabbed_12_1" name="__tabbed_12" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_12_1">Scala</label></div>
+<div class="tabbed-set tabbed-alternate" data-tabs="14:1"><input
checked="checked" id="__tabbed_14_1" name="__tabbed_14" type="radio" /><div
class="tabbed-labels"><label for="__tabbed_14_1">Scala</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<div class="highlight"><pre><span></span><code><span
class="kd">val</span><span class="w"> </span><span class="n">schema</span><span
class="w"> </span><span class="o">=</span><span class="w"> </span><span
class="nc">StructType</span><span class="p">(</span><span
class="nc">Array</span><span class="p">(</span><span class="w"></span>
@@ -3154,7 +3325,7 @@ case. Columns for the left and right user data must be
provided.</p>
<small>
Last update:
- <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 16, 2023 00:00:53</span>
+ <span class="git-revision-date-localized-plugin
git-revision-date-localized-plugin-datetime">March 27, 2023 08:02:49</span>
</small>