ASF GitHub Bot commented on HIVE-17580:

GitHub user vihangk1 opened a pull request:


    HIVE-17580 : Remove dependency of get_fields_with_environment_context API 
to serde

    This version of patch moves TypeInfo and its sub-classes to 
standalone-metastore. The motivation of doing this is that metastore needs the 
TypeInfo like classes to store the metadata about types. This is implemented by 
TypeInfos in Hive. Metastore needs this information because table like avro can 
define schema externally using url to a file containing schema or a string 
value of the schema added as a table property. In such cases metastore need to 
parse this information and convert them into FieldSchema. Before this patch 
this String->FieldSchema conversion was done using SerDes using the 
ObjectInspectors and the typeInfos from them. This patch bypasses a lot of that 
to remove the dependency to the SerDes such that it converts the String -> 
TypeInfo -> FieldSchema.
    In order to achieve this and also for reducing duplicate code and a cleaner 
design, this patch moves TypeInfo and its subclasses (ListTypeInfo, 
MapTypeInfo, StructTypeInfo, UnionTypeInfo), TypeInfoParser to standalone 
metastore. In case of PrimitiveTypeInfo, Hive code has added lot more than just 
type metadata in PrimitiveTypeInfo. Specifically, PrimitiveTypeEntry, 
PrimitiveCategory is type implementation detail which cannot be moved to 
standalone-metastore. Not to mention bring in PrimitiveTypeEntry bring in a 
whole lot of dependent code with it. To workaround this issue, a new class 
called MetastorePrimitiveTypeInfo is introduced in standalone-metastore. This 
class contains only the information which is needed by metastore from 
PrimitiveTypeInfo and PrimitiveTypeInfo extends MetastorePrimitiveTypeInfo. 
This way we reduce the scope of changes greatly. PrimitiveTypeInfo now contains 
implementation details of Hive's primitive types. Moving TypeInfo to 
standalone-metastore also needs the Category enum which unfortunately was 
defined in ObjectInspector. This is no way around this and this patch had to 
move Category to TypeInfo from ObjectInspector. Most of the file changes are 
due to this move.
    Moving TypeInfoFactory was also very disruptive and hence an interface 
called ITypeInfoFactory is created in metastore and both metastore and hive 
implement this interface. The Avro storage schema reader now can use the 
TypeInfoToSchema and SchemaToTypeInfo util classes (also moved to metastore) 
using the ITypeInfoFactory interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vihangk1/hive vihangk1_HIVE-17580v4

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #310
commit 2bdf6e18132f99f8998eceed8af0b77865fd85d4
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-02-22T21:10:03Z

    Moved TypeInfo to standalone-metastore

commit 756a394280d0a940b7dbcca05805a62978c4d8b2
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-02-22T22:35:50Z

    Introduce Avro storage schema reader


> Remove dependency of get_fields_with_environment_context API to serde
> ---------------------------------------------------------------------
>                 Key: HIVE-17580
>                 URL: https://issues.apache.org/jira/browse/HIVE-17580
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Standalone Metastore
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-17580.003-standalone-metastore.patch, 
> HIVE-17580.04-standalone-metastore.patch, 
> HIVE-17580.05-standalone-metastore.patch
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.

This message was sent by Atlassian JIRA

Reply via email to