homatthew commented on code in PR #3632:
URL: https://github.com/apache/gobblin/pull/3632#discussion_r1103108455
##########
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java:
##########
@@ -261,27 +259,69 @@ public static SerDeInfo getSerDeInfo(HiveRegistrationUnit
unit) {
return si;
}
+ public static boolean containsNonOptionalUnionTypeColumn(Table t) {
+ return containsNonOptionalUnionTypeColumn(getHiveTable(t));
+ }
+
/**
- * Util for detecting if {@param hiveTable} has a non-optional union (aka
complex unions) column types. A non optional
- * union is defined as a uniontype with multiple possible types and none of
them are null
+ * Util for detecting if a hive table has a non-optional union (aka complex
unions) column types. A non optional
+ * union is defined as a uniontype with n >= 2 non-null subtypes
*
- * @param hiveTable
- * @return if hive table contains complex uniontype columns
+ * @param hiveTable Hive table
+ * @return if hive table contains non-optional uniontype columns
*/
public static boolean containsNonOptionalUnionTypeColumn(HiveTable
hiveTable) {
- if (!isAvroFormat(hiveTable)) {
- // All values in ORC are optional / nullable
- return false;
+ if (hiveTable.getProps().contains("avro.schema.literal")) {
+ Schema.Parser parser = new Schema.Parser();
+ Schema schema =
parser.parse(hiveTable.getProps().getProp("avro.schema.literal"));
+ return isNonOptionalUnion(schema);
}
- return hiveTable.getColumns().stream()
- .map(HiveRegistrationUnit.Column::getType)
- .map(Object::toString)
- .anyMatch(columnType -> columnType.contains("uniontype") &&
!columnType.contains("void"));
+ if (isNonAvroFormat(hiveTable)) {
Review Comment:
This is a fallback case if schema literal is not set. Where we can use the
ORC type parser to determine if the column is a non-optional union.
This does not work if the underlying table is not ORC based
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]