jbampton commented on code in PR #1639:
URL: https://github.com/apache/sedona/pull/1639#discussion_r1799804104
##########
python/sedona/sql/dataframe_api.py:
##########
@@ -24,8 +24,23 @@
from pyspark.sql import Column, SparkSession
from pyspark.sql import functions as f
-ColumnOrName = Union[Column, str]
-ColumnOrNameOrNumber = Union[Column, str, float, int]
+try:
+ from pyspark.sql.connect.column import Column as ConnectColumn
+ from pyspark.sql.utils import is_remote
+except ImportError:
+ # be backwards compatible with spark < 3.4
Review Comment:
```suggestion
# be backwards compatible with Spark < 3.4
```
##########
python/sedona/spark/SedonaContext.py:
##########
@@ -34,8 +41,11 @@ def create(cls, spark: SparkSession) -> SparkSession:
:return: SedonaContext which is an instance of SparkSession
"""
spark.sql("SELECT 1 as geom").count()
- PackageImporter.import_jvm_lib(spark._jvm)
- spark._jvm.SedonaContext.create(spark._jsparkSession, "python")
+
+ # with spark connect there is no local jvm
Review Comment:
```suggestion
# with Spark Connect there is no local JVM
```
##########
python/sedona/sql/dataframe_api.py:
##########
@@ -86,6 +103,10 @@ def _get_type_list(annotated_type: Type) -> Tuple[Type,
...]:
else:
valid_types = (annotated_type,)
+ # functions accepting a Column should also accept the spark connect sort
of Column
Review Comment:
```suggestion
# functions accepting a Column should also accept the Spark Connect sort
of Column
```
##########
.github/workflows/python.yml:
##########
@@ -153,3 +153,20 @@ jobs:
SPARK_VERSION: ${{ matrix.spark }}
HADOOP_VERSION: ${{ matrix.hadoop }}
run: (export
SPARK_HOME=$PWD/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION};export
PYTHONPATH=$SPARK_HOME/python;cd python;pipenv run pytest tests)
+ - env:
+ SPARK_VERSION: ${{ matrix.spark }}
+ HADOOP_VERSION: ${{ matrix.hadoop }}
+ run: |
+ if [ ! -f
"spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}/sbin/start-connect-server.sh"
]
+ then
+ echo "Skipping connect tests for spark $SPARK_VERSION"
Review Comment:
```suggestion
echo "Skipping connect tests for Spark $SPARK_VERSION"
```
##########
python/sedona/sql/dataframe_api.py:
##########
@@ -49,13 +64,15 @@ def call_sedona_function(
)
# apparently a Column is an Iterable so we need to check for it explicitly
- if (
- (not isinstance(args, Iterable))
- or isinstance(args, str)
- or isinstance(args, Column)
+ if (not isinstance(args, Iterable)) or isinstance(
+ args, (str, Column, ConnectColumn)
):
args = [args]
+ # in spark-connect environments use connect api
Review Comment:
```suggestion
# in spark-connect environments use connect API
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]