the-other-tim-brown commented on code in PR #14344:
URL: https://github.com/apache/hudi/pull/14344#discussion_r2561124175
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java:
##########
@@ -145,54 +141,46 @@ private static boolean isFieldExistsInSchema(Map<String,
String> newTableSchema,
}
/**
- * Returns equivalent Hive table schema read from a parquet file.
+ * Returns equivalent Hive table schema for the provided table schema.
*
- * @param messageType : Parquet Schema
- * @return : Hive Table schema read from parquet file MAP[String,String]
+ * @param schema Table Schema
+ * @return Hive Table schema MAP[String,String]
*/
- public static Map<String, String>
convertParquetSchemaToHiveSchema(MessageType messageType, boolean
supportTimestamp) throws IOException {
- return convertMapSchemaToHiveSchema(parquetSchemaToMapSchema(messageType,
supportTimestamp, true));
+ public static Map<String, String> convertSchemaToHiveSchema(HoodieSchema
schema, boolean supportTimestamp) throws IOException {
+ return convertMapSchemaToHiveSchema(hoodieSchemaToMapSchema(schema,
supportTimestamp, true));
}
/**
- * Returns equivalent Hive table Field schema read from a parquet file.
+ * Returns equivalent Hive table Field schema for the provided table schema.
*
- * @param messageType : Parquet Schema
- * @return : Hive Table schema read from parquet file List[FieldSchema]
without partitionField
+ * @param schema Table Schema
+ * @return Hive Table schema without partitionField
*/
- public static List<FieldSchema>
convertParquetSchemaToHiveFieldSchema(MessageType messageType, HiveSyncConfig
syncConfig) throws IOException {
- return
convertMapSchemaToHiveFieldSchema(parquetSchemaToMapSchema(messageType,
syncConfig.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false), syncConfig);
+ public static List<FieldSchema> convertSchemaToHiveFieldSchema(HoodieSchema
schema, HiveSyncConfig syncConfig) {
+ return convertMapSchemaToHiveFieldSchema(hoodieSchemaToMapSchema(schema,
syncConfig.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false), syncConfig);
}
/**
- * Returns schema in Map<String,String> form read from a parquet file.
+ * Returns schema in Map<String,String> form translated from the table's
schema.
*
- * @param messageType : parquet Schema
+ * @param schema the current schema
* @param supportTimestamp
- * @param doFormat : This option controls whether schema will have spaces in
the value part of the schema map. This is required because spaces in complex
schema trips the HMS create table calls.
+ * @param doFormat This option controls whether schema will have spaces in
the value part of the schema map. This is required because spaces in complex
schema trips the HMS create table calls.
* This value will be false for HMS but true for
QueryBasedDDLExecutors
- * @return : Intermediate schema in the form of Map<String, String>
+ * @return Intermediate schema in the form of Map<String, String>
*/
- public static LinkedHashMap<String, String>
parquetSchemaToMapSchema(MessageType messageType, boolean supportTimestamp,
boolean doFormat) throws IOException {
- LinkedHashMap<String, String> schema = new LinkedHashMap<>();
- List<Type> parquetFields = messageType.getFields();
- for (Type parquetType : parquetFields) {
- StringBuilder result = new StringBuilder();
- String key = parquetType.getName();
- if (parquetType.isRepetition(Type.Repetition.REPEATED)) {
Review Comment:
Yes, it is an array. it is handled in convertField
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]