[jira] [Commented] (PHOENIX-7407) Remove deprecated datasource V1 code from spark2 and spark3 connector

ASF GitHub Bot (Jira) Thu, 29 May 2025 22:02:04 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955102#comment-17955102
 ]


ASF GitHub Bot commented on PHOENIX-7407:
-----------------------------------------

stoty commented on code in PR #145:
URL: 
https://github.com/apache/phoenix-connectors/pull/145#discussion_r2115135405


##########
phoenix5-spark/src/main/scala/org/apache/phoenix/spark/datasource/v2/PhoenixSparkSqlRelation.scala:
##########
@@ -1,3 +1,21 @@
+/*

Review Comment:
   Thanks.
   As you are the orginal author, we can handle this here instead of a separate 
ticket.



##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration 
confToClone) {
 
     @Test
     public void basicWriteAndReadBackTest() throws SQLException {
-        SparkConf sparkConf = new 
SparkConf().setMaster("local").setAppName("phoenix-test")
-                .set("spark.hadoopRDD.ignoreEmptySplits", "false");
-        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
-        SQLContext sqlContext = new SQLContext(jsc);
+
+        SparkSession spark = SparkUtil.getSparkSession();

Review Comment:
   Looks like spark.hadoopRDD.ignoreEmptySplits is default since 3.2.0, so 
removing it should be OK.



##########
phoenix5-spark3/src/main/java/org/apache/phoenix/spark/sql/connector/writer/PhoenixWriteBuilder.java:
##########
@@ -18,13 +18,28 @@
 package org.apache.phoenix.spark.sql.connector.writer;
 
 import 
org.apache.phoenix.thirdparty.com.google.common.annotations.VisibleForTesting;
-import org.apache.spark.sql.connector.write.BatchWrite;
-import org.apache.spark.sql.connector.write.LogicalWriteInfo;
 import org.apache.spark.sql.connector.write.WriteBuilder;
+import org.apache.spark.sql.connector.write.SupportsOverwrite;
+import org.apache.spark.sql.connector.write.LogicalWriteInfo;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.sources.Filter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.util.Map;
 
-public class PhoenixWriteBuilder implements WriteBuilder {
+/**
+ * The PhoenixWriteBuilder class is responsible for constructing and 
configuring a write operation
+ * for Phoenix when interfacing with Spark's data source API. This class 
implements the WriteBuilder
+ * interface for building write operations and SupportsOverwrite interface to 
handle overwrite behavior.
+ *
+ * The class facilitates the creation of a batch write operation that is 
configured with the provided
+ * logical write information and options specific to the Phoenix data source.
+ *
+ * Note: Overwrite mode does not do truncate table and behave the same as 
Append mode.

Review Comment:
   grammer: "behaves"



##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration 
confToClone) {
 
     @Test
     public void basicWriteAndReadBackTest() throws SQLException {
-        SparkConf sparkConf = new 
SparkConf().setMaster("local").setAppName("phoenix-test")
-                .set("spark.hadoopRDD.ignoreEmptySplits", "false");
-        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
-        SQLContext sqlContext = new SQLContext(jsc);
+
+        SparkSession spark = SparkUtil.getSparkSession();

Review Comment:
   since 3.2.0 / SPARK-34809 spark.hadoopRDD.ignoreEmptySplits is enabled by 
default.
   
   However, this uses Spark 2. Shouldn't we keep that property for Spark 2 ?



##########
phoenix5-spark3/README.md:
##########
@@ -165,7 +165,9 @@ The `save` is method on DataFrame allows passing in a data 
source type. You can
 specify which table to persist the DataFrame to. The column names are derived 
from
 the DataFrame's schema field names, and must match the Phoenix column names.
 
-The `save` method also takes a `SaveMode` option, for which only 
`SaveMode.Append` is supported.
+The `save` method also takes a `SaveMode` option, it is recommended to use 
`SaveMode.Append`.
+For maintaining compatibility with source type `"org.apache.phoenix.spark"`, 
+`SaveMode.Overwrite` is accepted but it behave same way as `SaveMode.Append`.

Review Comment:
   grammar: "behaves the same way"



##########
phoenix5-spark3/README.md:
##########
@@ -341,10 +343,8 @@ the deprected `zkUrl` parameter for backwards 
compatibility purposes. If neither
 it falls back to using connection defined by hbase-site.xml.
 - `"jdbcUrl"` expects a full Phoenix JDBC URL, i.e. "jdbc:phoenix" or 
"jdbc:phoenix:zkHost:zkport",
 while `"zkUrl"` expects the ZK quorum only, i.e. "zkHost:zkPort"


> Remove deprecated datasource V1 code from spark2 and spark3 connector
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-7407
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7407
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: rejeb ben rejeb
>            Assignee: rejeb ben rejeb
>            Priority: Major
>
> The pupose of this jira is to remove deprecated datasource V1 code. It is 
> safe to remove these classes since they are used internally by spark and not 
> referenced directly in applications code.
> But in order to not impact old applications, all V1 interfaces (utlity 
> methods and the source type "org.apache.phoenix.spark") will be kept and code 
> will modified to use new connector version classes.
> As dfiscussed on dev mailing list, one acceptable side effect is that spark3 
> SameMode will accept both "Append" and "Overwrite" values. However behavior 
> will be the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-7407) Remove deprecated datasource V1 code from spark2 and spark3 connector

Reply via email to