umehrot2 commented on a change in pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#discussion_r614431975
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -388,7 +399,8 @@ private[hudi] object HoodieSparkSqlWriter {
}
}
- private def syncHive(basePath: Path, fs: FileSystem, parameters: Map[String,
String]): Boolean = {
+ private def syncHive(basePath: Path, fs: FileSystem, parameters: Map[String,
String],
+ hadoopConf: Configuration): Boolean = {
Review comment:
This modification seems unnecessary, as `hadoopConf` is not being used.
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -306,7 +311,10 @@ private[hudi] object HoodieSparkSqlWriter {
} finally {
writeClient.close()
}
- val metaSyncSuccess = metaSync(parameters, basePath,
jsc.hadoopConfiguration)
+ val newParameters =
+ addSqlTableProperties(sqlContext.sparkSession.sessionState.conf,
df.schema, parameters)
Review comment:
Can be moved to `metaSync` or `syncHive` method.
##########
File path:
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -88,6 +88,12 @@
@Parameter(names = {"--verify-metadata-file-listing"}, description = "Verify
file listing from Hudi's metadata against file system")
public Boolean verifyMetadataFileListing =
HoodieMetadataConfig.DEFAULT_METADATA_VALIDATE;
+ @Parameter(names = {"--table-properties"}, description = "Table properties
to hive table")
+ public String tableProperties;
+
+ @Parameter(names = {"--serde-properties"}, description = "Serde properties
to hive table")
+ public String serdeProperties;
+
Review comment:
Can you update the `toString()` in this class ?
##########
File path:
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -138,6 +138,27 @@ public void updatePartitionsToTable(String tableName,
List<String> changedPartit
}
}
+ /**
+ * Update the table properties to the table.
+ * @param tableProperties
+ */
+ @Override
+ public void updateTableProperties(String tableName, Map<String, String>
tableProperties) {
+ if (tableProperties == null || tableProperties.size() == 0) {
Review comment:
nit: `tableProperties.isEmpty()` ?
##########
File path:
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -164,7 +165,13 @@ private void syncHoodieTable(String tableName, boolean
useRealtimeInputFormat) {
LOG.info("Storage partitions scan complete. Found " +
writtenPartitionsSince.size());
// Sync the partitions if needed
syncPartitions(tableName, writtenPartitionsSince);
-
+ // Sync the table properties if need
+ if (cfg.tableProperties != null) {
+ Map<String, String> tableProperties =
ConfigUtils.toMap(cfg.tableProperties);
+ hoodieHiveClient.updateTableProperties(tableName, tableProperties);
+ LOG.info("Sync table properties for " + tableName + ", table properties
is: "
+ + cfg.tableProperties);
+ }
Review comment:
Can't we sync this while creating the table itself, like you are doing
for serde properties ?
##########
File path:
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -353,6 +353,8 @@ object DataSourceWriteOptions {
val HIVE_IGNORE_EXCEPTIONS_OPT_KEY =
"hoodie.datasource.hive_sync.ignore_exceptions"
val HIVE_SKIP_RO_SUFFIX = "hoodie.datasource.hive_sync.skip_ro_suffix"
val HIVE_SUPPORT_TIMESTAMP = "hoodie.datasource.hive_sync.support_timestamp"
+ val HIVE_TABLE_PROPERTIES = "hoodie.datasource.hive_sync.table_properties"
Review comment:
Lets introduce another additional boolean property
`hoodie.datasource.hive_sync.sync_as_datasource` and put the feature behind it.
We can use `true` by default, but atleast there would be a way to turn it off.
This is going to change the way spark sql queries currently run with Hudi, and
is a huge change.
##########
File path:
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##########
@@ -138,6 +138,27 @@ public void updatePartitionsToTable(String tableName,
List<String> changedPartit
}
}
+ /**
+ * Update the table properties to the table.
+ * @param tableProperties
+ */
Review comment:
Can you improve the javadoc ? It has missing properties and descriptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]