I was trying to add decimal, timestamp, date, array, map type support to PyHive DBAPI. In order to parse the result set correctly, I have to know the result set schema for each SELECT. For simple types(integer, string, timestamp, decimal, …), it’s not a problem. I can get all information by calling HiveServer2.GetResultSetMetadata. But for complex types(array, map, struct), the nested type information is missing. I can’t find a way to know if it’s an integer array or a string array.
According to TCLIService.thrift <https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188> , recursively defined types such as array<int>, map<int, string> should be described by TTypeEntry.arrayEntry, TTypeEntry.mapEntry rather than TTypeEntry.primitivyEntry in the first element ofTypeDesc.types. The nested types should be reside inTypeDesc.types` as following elements, and be pointed from the first element. However, I got just a single TTypeEntry.primitivyEntry in TypeDesc.types with TPrimitiveTypeEntry.type = ARRAY_TYPE when I actually called GetResultSetMetadata for the query SELECT array(1, 2, 3) . It violated both the descriptions of “TTypeDesc employs a type list that maps integer “pointers” to TTypeEntry objects” <https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L147-L188> and “The primitive type token. This must satisfy the condition that type is in the PRIMITIVE_TYPES set.” <https://github.com/apache/hive/blob/release-1.2.1/service/if/TCLIService.thrift#L210-L215> I tried the following script. create temporary table dummy(a int);insert into table dummy values (1), (2), (3);create temporary table tt(a int, b string, c map<INT, ARRAY<string>>);insert into table tt select 1, 'a', map(3, array('a','b','c')) from dummy limit 1;select * from tt; And called GetResultSetMetadata right after executing the SELECT query. The value of response.schema.columns was [TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc( types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None), TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None), TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=11, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=3, comment=None)] However, according to the thrift file, it should be [TColumnDesc(columnName='tt.a', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment=None), TColumnDesc(columnName='tt.b', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=2, comment=None), TColumnDesc(columnName='tt.c', typeDesc=TTypeDesc(types=[ TTypeEntry(primitiveEntry=None, arrayEntry=None, mapEntry=TMapTypeEntry(keyTypePtr=1, valueTypePtr=2), structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=None, arrayEntry=TArrayTypeEntry(objectTypePtr=3), mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None), TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=3, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None) ]), position=3, comment=None)] I found the related function in hive codebase. https://github.com/apache/hive/blob/release-1.2.1/service/src/java/org/apache/hive/service/cli/TypeDescriptor.java#L66-L76 It seems that this function always put TPrimitiveTypeEntry to TTypeDesc.type, even for complex type like array and map which is inconsistent with the thirft file.
