[GitHub] [hive] pvary commented on a diff in pull request #3361: HIVE-26298: Selecting complex types on migrated iceberg table does not work

GitBox Wed, 15 Jun 2022 00:40:29 -0700


pvary commented on code in PR #3361:
URL: https://github.com/apache/hive/pull/3361#discussion_r897640519



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -796,29 +796,51 @@ private String 
collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun
    *   <li>fileformat is set to avro</li>
    *   <li>querying metadata tables</li>
    *   <li>fileformat is set to ORC, and table schema has time type column</li>
+   *   <li>fileformat is set to PARQUET, and table schema has a list type 
column, that has a complex type element</li>
    * </ul>
    * @param tableProps table properties, must be not null
    */
   private void fallbackToNonVectorizedModeBasedOnProperties(Properties 
tableProps) {
+    Schema tableSchema = 
SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA));
     if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION)) ||
         
FileFormat.AVRO.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT))
 ||
         (tableProps.containsKey("metaTable") && 
isValidMetadataTable(tableProps.getProperty("metaTable"))) ||
-        hasOrcTimeInSchema(tableProps)) {
+        hasOrcTimeInSchema(tableProps, tableSchema) ||
+        !hasParquetListColumnSupport(tableProps, tableSchema)) {
       conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
false);
     }
   }
 
   // Iceberg Time type columns are written as longs into ORC files. There is 
no Time type in Hive, so it is represented
   // as String instead. For ORC there's no automatic conversion from long to 
string during vectorized reading such as
   // for example in Parquet (in Parquet files Time type is an int64 with 
'time' logical annotation).
-  private static boolean hasOrcTimeInSchema(Properties tableProps) {
+  private static boolean hasOrcTimeInSchema(Properties tableProps, Schema 
tableSchema) {
     if 
(!FileFormat.ORC.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT)))
 {
       return false;
     }
-    Schema tableSchema = 
SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA));
     return tableSchema.columns().stream().anyMatch(f -> 
Types.TimeType.get().typeId() == f.type().typeId());
   }
 
+  // Vectorized reads of parquet files from columns with list type is only 
supported if the element is a primitive type

Review Comment:
   nit: Could we convert this to javadoc instead? Same for the method above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] pvary commented on a diff in pull request #3361: HIVE-26298: Selecting complex types on migrated iceberg table does not work

Reply via email to