homatthew commented on code in PR #3632:
URL: https://github.com/apache/gobblin/pull/3632#discussion_r1102036987


##########
gobblin-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java:
##########
@@ -256,6 +261,29 @@ public static SerDeInfo getSerDeInfo(HiveRegistrationUnit 
unit) {
     return si;
   }
 
+  /**
+   * Util for detecting if {@param hiveTable} has a non-optional union (aka 
complex unions) column types. A non optional
+   * union is defined as a uniontype with multiple possible types and none of 
them are null
+   *
+   * @param hiveTable
+   * @return if hive table contains complex uniontype columns
+   */
+  public static boolean containsNonOptionalUnionTypeColumn(HiveTable 
hiveTable) {
+    if (!isAvroFormat(hiveTable)) {
+      // All values in ORC are optional / nullable
+      return false;
+    }
+
+    return hiveTable.getColumns().stream()
+        .map(HiveRegistrationUnit.Column::getType)
+        .map(Object::toString)
+        .anyMatch(columnType -> columnType.contains("uniontype") && 
!columnType.contains("void"));

Review Comment:
   You are correct. Upon verifying the definition in [this iceberg 
doc](https://docs.google.com/document/d/1Go2NrOoeCKfrDJw8MAZnsMYbE2KZbso1zkSY8cpLzaQ/edit#heading=h.gd2qof50gbzs),
 it seems like there was some confusion on what is "non-optional" / complex.
   - Source: https://github.com/apache/iceberg/issues/189
   
   The correct definition as described in the document is: 
   >Non-optional union: A union type with more than one non-null option type
     e.g. [“int”, “string”] or [“null”, “int”, “string”]
   Alternate phrasing:
   > A schema where there 's a union type of >= 2 branches which are non null.
   >[int, null] - no
   [int, string] - yes
   [int, string, null] yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to