[ 
https://issues.apache.org/jira/browse/DRILL-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276011#comment-14276011
 ] 

Steven Phillips commented on DRILL-1997:
----------------------------------------

I actually think this is correct. The schema of the file:

message hive_schema {
  optional int32 c1;
  optional boolean c2;
  optional double c3;
  optional binary c4;
  optional group c5 (LIST) {
    repeated group bag {
      optional int32 array_element;
    }
  }
  optional group c6 (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 key;
      optional binary value;
    }
  }
  optional group c7 (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required binary key;
      optional binary value;
    }
  }
  optional group c8 {
    optional binary r;
    optional int32 s;
    optional double t;
  }
  optional int32 c9;
  optional int32 c10;
  optional float c11;
  optional int64 c12;
  optional group c13 (LIST) {
    repeated group bag {
      optional group array_element (LIST) {
        repeated group bag {
          optional binary array_element;
        }
      }
    }
  }
  optional group c15 {
    optional int32 r;
    optional group s {
      optional int32 a;
      optional binary b;
    }
  }
  optional group c16 (LIST) {
    repeated group bag {
      optional group array_element {
        optional group m (MAP) {
          repeated group map (MAP_KEY_VALUE) {
            required binary key;
            optional binary value;
          }
        }
        optional int32 n;
      }
    }
  }
}

The string value in c6 is simply stored as binary, with no metadata indicating 
that it is UTF-8 encoded string. I think this indicates that hive currently 
does not support the utf-8 converted type. In sqlline, when displaying a 
complex object, we use json. And binary values are displayed as base64 in json.

> Hive generated parquet files with maps containing strings return wrong value
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-1997
>                 URL: https://issues.apache.org/jira/browse/DRILL-1997
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Parth Chandra
>            Priority: Critical
>         Attachments: hive_alltypes.parquet
>
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet;          
> OK
> c1                    int                                         
> c2                    boolean                                     
> c3                    double                                      
> c4                    string                                      
> c5                    array<int>                                  
> c6                    map<int,string>                             
> c7                    map<string,string>                          
> c8                    struct<r:string,s:int,t:double>                     
> c9                    tinyint                                     
> c10                   smallint                                    
> c11                   float                                       
> c12                   bigint                                      
> c13                   array<array<string>>                        
> c15                   struct<r:int,s:struct<a:int,b:string>>                  
>     
> c16                   array<struct<m:map<string,string>,n:int>>               
>             
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> All the complex types with string in them are returning incorrect values in 
> drill. For example:
> hive> select c6 from alltypesparquet;
> NULL
> NULL
> {1:"x",2:"y"}
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> +------------+
> |     c6     |
> +------------+
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> +------------+
> 3 rows selected (0.077 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to