Hisoka-X commented on code in PR #9743:
URL: https://github.com/apache/seatunnel/pull/9743#discussion_r2312201348
##########
seatunnel-connectors-v2/connector-hive/pom.xml:
##########
@@ -82,32 +83,30 @@
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.exec.version}</version>
- <scope>provided</scope>
Review Comment:
Please keep hive-exec as provided. Because user could be different version
##########
seatunnel-connectors-v2/connector-hive/src/main/java/org/apache/seatunnel/connectors/seatunnel/hive/utils/HiveMetaStoreProxy.java:
##########
@@ -158,6 +159,36 @@ public Table getTable(@NonNull String dbName, @NonNull
String tableName) {
}
}
+ public void createDatabaseIfNotExists(String db) throws TException {
Review Comment:
How about let `HiveMetaStoreProxy` implements Catalog interface? So that we
do need overwrite most method of `HiveSaveModeHandler`.
##########
docs/en/connector-v2/sink/Hive.md:
##########
@@ -100,6 +100,22 @@ Support writing Parquet INT96 from a timestamp, only valid
for parquet files.
Flag to decide whether to use overwrite mode when inserting data into Hive. If
set to true, for non-partitioned tables, the existing data in the table will be
deleted before inserting new data. For partitioned tables, the data in the
relevant partition will be deleted before inserting new data.
+### schema_save_mode [enum]
+
+Before starting the synchronization task, different processing schemes are
selected for the existing table structure on the target side.
+
+Option values:
+- `RECREATE_SCHEMA`: Will create when the table does not exist, delete and
rebuild when the table exists
+- `CREATE_SCHEMA_WHEN_NOT_EXIST`: Will create when the table does not exist,
skip when the table exists
+- `ERROR_WHEN_SCHEMA_NOT_EXIST`: Error will be reported when the table does
not exist
+- `IGNORE`: Ignore the treatment of the table
+
+
+
+### save_mode_create_template [string]
Review Comment:
What's default value of `save_mode_create_template`?
##########
seatunnel-connectors-v2/connector-hive/pom.xml:
##########
@@ -82,32 +83,30 @@
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.exec.version}</version>
- <scope>provided</scope>
<exclusions>
+ <!-- Exclude logging dependencies to avoid conflicts -->
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-1.2-api</artifactId>
- </exclusion>
- <exclusion>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-slf4j-impl</artifactId>
- </exclusion>
- <exclusion>
- <groupId>org.apache.logging.log4j</groupId>
- <artifactId>log4j-web</artifactId>
+ <artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
+ <!-- Exclude format dependencies to avoid version conflicts -->
<exclusion>
<groupId>org.apache.parquet</groupId>
- <artifactId>parquet-hadoop-bundle</artifactId>
+ <artifactId>*</artifactId>
</exclusion>
+ <exclusion>
+ <groupId>org.apache.avro</groupId>
+ <artifactId>avro</artifactId>
+ </exclusion>
+ <!-- Exclude unnecessary dependencies -->
Review Comment:
where is other version dependency?
##########
seatunnel-connectors-v2/connector-hive/pom.xml:
##########
@@ -116,12 +115,81 @@
<groupId>org.pentaho</groupId>
<artifactId>pentaho-aggdesigner-algorithm</artifactId>
</exclusion>
+ <!-- Exclude to include separately with proper exclusions -->
+ <exclusion>
+ <groupId>org.apache.hive</groupId>
+ <artifactId>hive-common</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.hive</groupId>
+ <artifactId>hive-metastore</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <!-- Hive Common dependency - contains HiveConf class -->
+ <dependency>
+ <groupId>org.apache.hive</groupId>
+ <artifactId>hive-common</artifactId>
+ <version>${hive.exec.version}</version>
+ <exclusions>
+ <!-- Exclude logging dependencies to avoid conflicts -->
+ <exclusion>
+ <groupId>log4j</groupId>
+ <artifactId>log4j</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.logging.log4j</groupId>
+ <artifactId>*</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-log4j12</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+
+ <!-- Hive MetaStore dependency - contains HiveMetaStoreClient,
AlreadyExistsException and other metastore API classes -->
+ <dependency>
Review Comment:
ditto
##########
seatunnel-connectors-v2/connector-hive/pom.xml:
##########
@@ -116,12 +115,81 @@
<groupId>org.pentaho</groupId>
<artifactId>pentaho-aggdesigner-algorithm</artifactId>
</exclusion>
+ <!-- Exclude to include separately with proper exclusions -->
+ <exclusion>
+ <groupId>org.apache.hive</groupId>
+ <artifactId>hive-common</artifactId>
+ </exclusion>
+ <exclusion>
+ <groupId>org.apache.hive</groupId>
+ <artifactId>hive-metastore</artifactId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <!-- Hive Common dependency - contains HiveConf class -->
+ <dependency>
Review Comment:
ditto
##########
docs/en/connector-v2/sink/Hive.md:
##########
@@ -100,6 +100,22 @@ Support writing Parquet INT96 from a timestamp, only valid
for parquet files.
Flag to decide whether to use overwrite mode when inserting data into Hive. If
set to true, for non-partitioned tables, the existing data in the table will be
deleted before inserting new data. For partitioned tables, the data in the
relevant partition will be deleted before inserting new data.
+### schema_save_mode [enum]
Review Comment:
default value?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]