[
https://issues.apache.org/jira/browse/PHOENIX-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955102#comment-17955102
]
ASF GitHub Bot commented on PHOENIX-7407:
-----------------------------------------
stoty commented on code in PR #145:
URL:
https://github.com/apache/phoenix-connectors/pull/145#discussion_r2115135405
##########
phoenix5-spark/src/main/scala/org/apache/phoenix/spark/datasource/v2/PhoenixSparkSqlRelation.scala:
##########
@@ -1,3 +1,21 @@
+/*
Review Comment:
Thanks.
As you are the orginal author, we can handle this here instead of a separate
ticket.
##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration
confToClone) {
@Test
public void basicWriteAndReadBackTest() throws SQLException {
- SparkConf sparkConf = new
SparkConf().setMaster("local").setAppName("phoenix-test")
- .set("spark.hadoopRDD.ignoreEmptySplits", "false");
- JavaSparkContext jsc = new JavaSparkContext(sparkConf);
- SQLContext sqlContext = new SQLContext(jsc);
+
+ SparkSession spark = SparkUtil.getSparkSession();
Review Comment:
Looks like spark.hadoopRDD.ignoreEmptySplits is default since 3.2.0, so
removing it should be OK.
##########
phoenix5-spark3/src/main/java/org/apache/phoenix/spark/sql/connector/writer/PhoenixWriteBuilder.java:
##########
@@ -18,13 +18,28 @@
package org.apache.phoenix.spark.sql.connector.writer;
import
org.apache.phoenix.thirdparty.com.google.common.annotations.VisibleForTesting;
-import org.apache.spark.sql.connector.write.BatchWrite;
-import org.apache.spark.sql.connector.write.LogicalWriteInfo;
import org.apache.spark.sql.connector.write.WriteBuilder;
+import org.apache.spark.sql.connector.write.SupportsOverwrite;
+import org.apache.spark.sql.connector.write.LogicalWriteInfo;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.sources.Filter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
import java.util.Map;
-public class PhoenixWriteBuilder implements WriteBuilder {
+/**
+ * The PhoenixWriteBuilder class is responsible for constructing and
configuring a write operation
+ * for Phoenix when interfacing with Spark's data source API. This class
implements the WriteBuilder
+ * interface for building write operations and SupportsOverwrite interface to
handle overwrite behavior.
+ *
+ * The class facilitates the creation of a batch write operation that is
configured with the provided
+ * logical write information and options specific to the Phoenix data source.
+ *
+ * Note: Overwrite mode does not do truncate table and behave the same as
Append mode.
Review Comment:
grammer: "behaves"
##########
phoenix5-spark/src/it/java/org/apache/phoenix/spark/DataSourceApiIT.java:
##########
@@ -73,10 +73,8 @@ public Configuration getConfiguration(Configuration
confToClone) {
@Test
public void basicWriteAndReadBackTest() throws SQLException {
- SparkConf sparkConf = new
SparkConf().setMaster("local").setAppName("phoenix-test")
- .set("spark.hadoopRDD.ignoreEmptySplits", "false");
- JavaSparkContext jsc = new JavaSparkContext(sparkConf);
- SQLContext sqlContext = new SQLContext(jsc);
+
+ SparkSession spark = SparkUtil.getSparkSession();
Review Comment:
since 3.2.0 / SPARK-34809 spark.hadoopRDD.ignoreEmptySplits is enabled by
default.
However, this uses Spark 2. Shouldn't we keep that property for Spark 2 ?
##########
phoenix5-spark3/README.md:
##########
@@ -165,7 +165,9 @@ The `save` is method on DataFrame allows passing in a data
source type. You can
specify which table to persist the DataFrame to. The column names are derived
from
the DataFrame's schema field names, and must match the Phoenix column names.
-The `save` method also takes a `SaveMode` option, for which only
`SaveMode.Append` is supported.
+The `save` method also takes a `SaveMode` option, it is recommended to use
`SaveMode.Append`.
+For maintaining compatibility with source type `"org.apache.phoenix.spark"`,
+`SaveMode.Overwrite` is accepted but it behave same way as `SaveMode.Append`.
Review Comment:
grammar: "behaves the same way"
##########
phoenix5-spark3/README.md:
##########
@@ -341,10 +343,8 @@ the deprected `zkUrl` parameter for backwards
compatibility purposes. If neither
it falls back to using connection defined by hbase-site.xml.
- `"jdbcUrl"` expects a full Phoenix JDBC URL, i.e. "jdbc:phoenix" or
"jdbc:phoenix:zkHost:zkport",
while `"zkUrl"` expects the ZK quorum only, i.e. "zkHost:zkPort"
> Remove deprecated datasource V1 code from spark2 and spark3 connector
> ---------------------------------------------------------------------
>
> Key: PHOENIX-7407
> URL: https://issues.apache.org/jira/browse/PHOENIX-7407
> Project: Phoenix
> Issue Type: Improvement
> Reporter: rejeb ben rejeb
> Assignee: rejeb ben rejeb
> Priority: Major
>
> The pupose of this jira is to remove deprecated datasource V1 code. It is
> safe to remove these classes since they are used internally by spark and not
> referenced directly in applications code.
> But in order to not impact old applications, all V1 interfaces (utlity
> methods and the source type "org.apache.phoenix.spark") will be kept and code
> will modified to use new connector version classes.
> As dfiscussed on dev mailing list, one acceptable side effect is that spark3
> SameMode will accept both "Append" and "Overwrite" values. However behavior
> will be the same.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)