[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

ASF GitHub Bot (Jira) Thu, 30 Sep 2021 08:02:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=658413&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658413
 ]


ASF GitHub Bot logged work on HIVE-25553:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Sep/21 15:01
            Start Date: 30/Sep/21 15:01
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on a change in pull request #2689:
URL: https://github.com/apache/hive/pull/2689#discussion_r719480605



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -122,6 +122,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
           case SHORT:
             return Field.nullable(name, MinorType.SMALLINT.getType());
           case INT:
+            new Field(name, new FieldType(false, new ArrowType.Int(32, true), 
null), null);

Review comment:
       This statement is no-op. Should be removed.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -160,7 +161,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
         final ListTypeInfo listTypeInfo = (ListTypeInfo) typeInfo;
         final TypeInfo elementTypeInfo = listTypeInfo.getListElementTypeInfo();
         return new Field(name, FieldType.nullable(MinorType.LIST.getType()),
-            Lists.newArrayList(toField(DEFAULT_ARROW_FIELD_NAME, 
elementTypeInfo)));
+            Lists.newArrayList(toField(name, elementTypeInfo)));

Review comment:
       Why default name (DEFAULT_ARROW_FIELD_NAME) is changed in LIST but used 
in UNION?

##########
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapVectorArrow.java
##########
@@ -64,8 +65,8 @@ public static void beforeTest() throws Exception {
     return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is 
released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and 
parquet) for the llap-ext-client.

Review comment:
       Remove this statement and create a follow-up JIRA to fix this issue and 
link with this one.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -185,7 +186,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
         final TypeInfo keyTypeInfo = mapTypeInfo.getMapKeyTypeInfo();
         final TypeInfo valueTypeInfo = mapTypeInfo.getMapValueTypeInfo();
         final StructTypeInfo mapStructTypeInfo = new StructTypeInfo();
-        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("keys", 
"values"));
+        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("key", 
"value"));

Review comment:
       Why is this naming change significant?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -170,7 +171,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
         for (int i = 0; i < structSize; i++) {
           structFields.add(toField(fieldNames.get(i), fieldTypeInfos.get(i)));
         }
-        return new Field(name, FieldType.nullable(MinorType.STRUCT.getType()), 
structFields);
+        return new Field(name, new FieldType(false, new ArrowType.Struct(), 
null), structFields);

Review comment:
       Why struct value is not-nullable?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##########
@@ -226,7 +226,7 @@ public ArrowWrapperWritable 
serializeBatch(VectorizedRowBatch vectorizedRowBatch
   }
 
   private static FieldType toFieldType(TypeInfo typeInfo) {
-    return new FieldType(true, toArrowType(typeInfo), null);
+    return new FieldType(false, toArrowType(typeInfo), null);

Review comment:
       Why is it changed to nullable=false?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 658413)
    Time Spent: 20m  (was: 10m)

> Support Map data-type natively in Arrow format
> ----------------------------------------------
>
>                 Key: HIVE-25553
>                 URL: https://issues.apache.org/jira/browse/HIVE-25553
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap, Serializers/Deserializers
>            Reporter: Adesh Kumar Rao
>            Assignee: Adesh Kumar Rao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

Reply via email to