[jira] [Work logged] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

ASF GitHub Bot (Jira) Wed, 15 Jun 2022 00:41:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26298?focusedWorklogId=781520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781520
 ]


ASF GitHub Bot logged work on HIVE-26298:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jun/22 07:40
            Start Date: 15/Jun/22 07:40
    Worklog Time Spent: 10m 
      Work Description: pvary commented on code in PR #3361:
URL: https://github.com/apache/hive/pull/3361#discussion_r897640519


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -796,29 +796,51 @@ private String 
collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun
    *   <li>fileformat is set to avro</li>
    *   <li>querying metadata tables</li>
    *   <li>fileformat is set to ORC, and table schema has time type column</li>
+   *   <li>fileformat is set to PARQUET, and table schema has a list type 
column, that has a complex type element</li>
    * </ul>
    * @param tableProps table properties, must be not null
    */
   private void fallbackToNonVectorizedModeBasedOnProperties(Properties 
tableProps) {
+    Schema tableSchema = 
SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA));
     if ("2".equals(tableProps.get(TableProperties.FORMAT_VERSION)) ||
         
FileFormat.AVRO.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT))
 ||
         (tableProps.containsKey("metaTable") && 
isValidMetadataTable(tableProps.getProperty("metaTable"))) ||
-        hasOrcTimeInSchema(tableProps)) {
+        hasOrcTimeInSchema(tableProps, tableSchema) ||
+        !hasParquetListColumnSupport(tableProps, tableSchema)) {
       conf.setBoolean(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED.varname, 
false);
     }
   }
 
   // Iceberg Time type columns are written as longs into ORC files. There is 
no Time type in Hive, so it is represented
   // as String instead. For ORC there's no automatic conversion from long to 
string during vectorized reading such as
   // for example in Parquet (in Parquet files Time type is an int64 with 
'time' logical annotation).
-  private static boolean hasOrcTimeInSchema(Properties tableProps) {
+  private static boolean hasOrcTimeInSchema(Properties tableProps, Schema 
tableSchema) {
     if 
(!FileFormat.ORC.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT)))
 {
       return false;
     }
-    Schema tableSchema = 
SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA));
     return tableSchema.columns().stream().anyMatch(f -> 
Types.TimeType.get().typeId() == f.type().typeId());
   }
 
+  // Vectorized reads of parquet files from columns with list type is only 
supported if the element is a primitive type

Review Comment:
   nit: Could we convert this to javadoc instead? Same for the method above





Issue Time Tracking
-------------------

    Worklog Id:     (was: 781520)
    Time Spent: 40m  (was: 0.5h)

> Selecting complex types on migrated iceberg table does not work
> ---------------------------------------------------------------
>
>                 Key: HIVE-26298
>                 URL: https://issues.apache.org/jira/browse/HIVE-26298
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gergely Fürnstáhl
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 00001-a5d522f4-a065-44e6-983b-ba66596b4332.metadata.json
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I am working on implementing NameMapping in Impala (mainly replicating Hive's 
> functionality) and ran into the following issue:
> {code:java}
> CREATE TABLE array_demo
> (
>   int_primitive INT,
>   int_array ARRAY<INT>,
>   int_array_array ARRAY<ARRAY<INT>>,
>   int_to_array_array_Map MAP<INT,ARRAY<ARRAY<INT>>>
> )
> STORED AS ORC;
> INSERT INTO array_demo values (0, array(1), array(array(2), array(3,4)), 
> map(5,array(array(6),array(7,8))));
> select * from array_demo;
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
>  {code}
> Converting to iceberg
>  
>  
> {code:java}
> ALTER TABLE array_demo SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
> select * from array_demo;
> INFO  : Compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : No Stats for default@array_demo, Columns: int_primitive, int_array, 
> int_to_array_array_map, int_array_array
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:array_demo.int_primitive, type:int, 
> comment:null), FieldSchema(name:array_demo.int_array, type:array<int>, 
> comment:null), FieldSchema(name:array_demo.int_array_array, 
> type:array<array<int>>, comment:null), 
> FieldSchema(name:array_demo.int_to_array_array_map, 
> type:map<int,array<array<int>>>, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : Completed executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.0 seconds
> INFO  : OK
> Error: java.io.IOException: java.lang.IllegalArgumentException: Can not 
> promote MAP type to INTEGER (state=,code=0)
> select int_primitive from array_demo;
> +----------------+
> | int_primitive  |
> +----------------+
> | 0              |
> +----------------+
> 1 row selected (0.088 seconds)
>  {code}
> Removing schema.name-mapping.default solves it
> {code:java}
> ALTER TABLE array_demo UNSET TBLPROPERTIES ('schema.name-mapping.default');
> select * from array_demo;
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---------------------------+-----------------------+-----------------------------+------------------------------------+
>  {code}
> Possible cause:
>  
> The name mapping generated and pushed into schema.name-mapping.default is 
> different from the name mapping in the schema in the metadata.json (attached 
> it)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

Reply via email to