[ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314309#comment-16314309
 ] 

ASF GitHub Bot commented on HIVE-17580:
---------------------------------------

GitHub user vihangk1 opened a pull request:

    https://github.com/apache/hive/pull/287

    HIVE-17580 : Remove standalone-metastore's dependency with serdes

    Removing the dependency on serdes for the metastore requires a series of 
changes. I have created multiple commits which hopefully would be easier to 
review. Each major commit has a descriptive commit message to give a high level 
idea of what the change is doing. There are still some bits which need to be 
completed but it would be good to a review.
    
    Overview of all the changes done:
    1. Creates a new module called serde-api under storage-api like discussed. 
Although I think we can keep it separate as well.
    2. Moved List, Map, Struct, Constant, Primitive, Union ObjectInspectors to 
serde-api
    3. Moved PrimitiveTypeInfo, PrimitiveTypeEntry and TypeInfo to serde-api.
    4. Moved TypeInfoParser, TypeInfoFactory to serde-api
    5. Added a new class which reading avro storage schema by copying the code 
from AvroSerde and AvroSerdeUtils. The parsing is done such that String value 
is first converted into TypeInfos and then into FieldSchemas bypassing the need 
for ObjectInspectors. In theory we could get rid of TypeInfos as well but that 
path was getting too difficult with lot of duplicate code between Hive and 
metastore.
    6. Introduces a default storage schema reader. I noticed that most of the 
serdes use the same logic to parse the metadata information. This code should 
be refactored to a common place instead of having many copies (one in 
standalone hms and another set in multiple serdes)
    7. Moved HiveChar, HiveVarchar, HiveCharWritable, HiveVarcharWritable to 
storage-api. I noticed that HiveDecimal is already in storage-api. It probably 
makes sense to move the other primitive types (timestamp, interval etc)to 
storage-api as well but it requires storage-api to be upgraded to Java 8.
    8. Adds a basic test for the schema reader. I plan to add more tests as 
this code is reviewed.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vihangk1/hive vihangk1_HIVE-17580

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #287
    
----
commit bbfb7dc44904db74a840167c02b07f50a6010b69
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-11-09T00:52:39Z

    HIVE-17580 : Remove dependency of get_fields_with_environment_context API 
to serde

commit d54879845eff10c19bc17bda9e09dda16f6fa295
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-11-29T00:54:23Z

    Moved List, Map, Struct OI to storage-api

commit a12d6c7ba3de598c6b6f75da1bd4efcac43036b1
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-11-29T04:19:39Z

    Moved ConstantObjectInspector PrimitiveObjectInspector and 
UnionObjectInspector

commit 13fb832fc2d51958e75d5e609f6781f87449aed8
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-12-28T01:25:59Z

    Moved PrimitiveTypeInfo to serde-api
    
    In order to move PrimitiveTypeInfo we need to move the PrimitiveTypeEntry 
as well.
    PrimitiveTypeEntry depends on PrimitiveObjectInspectorUtils which cannot be 
pulled
    into serde-api. Hence the static final maps are moved to PrimitiveEntry and 
we provide
    static access methods to these maps along with the register method to add 
the key
    value pairs in the maps

commit 9cbc789fd3f4ced7ce66a7313c451b75a154976f
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-12-28T20:51:39Z

    Moved the other TypeInfos to serde-api
    
    In order to move the other TypeInfo classes to serde-api we need to move 
the serdeConstants.java
    as well. This is a thrift generated class. This commit copies the 
serde.thrift instead of moving.
    The only reason I did not move it is in case of backwards compatibility 
reasons (in case someone is using the thrift file location to do something).
    If it is okay to move serde.thrift from serde module to serde-api we can 
delete it from serde module in a separate
    change.
    
    The other concern is there are some TypeInfo classes which do some 
validation like VarCharTypeInfo, DecimalTypeInfo.
    The validating methods use the actual type implementation like HiveChar, 
HiveDecimal etc to ensure that the params
    are under the correct limits. This creates a problem since we cannot bring 
in the type implementations as well to
    serde-api. Currently, I have marked these as TODO and commented them out.

commit 23aa899a90d17648e560d39602a3bd29bf53661e
Author: Vihang Karajgaonkar <vihang@...>
Date:   2017-12-29T22:14:06Z

    Moved TypeInfoParser and TypeInfoFactory to serde-api
    
    In order to use TypeInfoParser in standalone metastore, we should move it 
to serde-api
    There is a problem in moving the TypeInfoParser to serde-api which is there 
are some validation util
    methods which validated the max lengths of varchar, char or decimal 
typeinfo's precision and scale.
    These values are originally defined in HiveChar, HiveVarChar and 
HiveDecimal which cannot be moved to
    serde-api.
    
    In order to fix this this change moved these constants to serde.thrift so 
that it can be accessed from
    serdeConstants instead of HiveChar or HiveVarChar (not sure if this is the 
right thing to do, but there
    doesn't seem any good option here)

commit 0b29dc43e78eaa33e0cb672fb9cdabfb5b02d534
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-03T19:45:32Z

    Add avro storeage schema reader
    
    This commit adds a AvroStorageSchemaReader which reads the Avro schema 
files both for external schema and regular avro tables.
    Most of the util methods are in AvroSchemaUtils class which has methods 
copied from AvroSerDeUtils. Some of the needed classes like
    SchemaResolutionProblem, InstanceCache, SchemaToTypeInfo, TypeInfoToSchema 
are also copied from Hive. The constants defined
    in AvroSerde are copied in AvroSerdeConstants. The class 
AvroFieldSchemaGenerator converts the AvroSchema into List of
    FieldSchema which is returned by the AvroStorageSchemaReader

commit 5f2b174cd4e038e9871348dd4cce5389f0f75776
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-04T01:02:40Z

    Introduce default storage schema reader
    
    This change introduces a default storage schema reader which copies the 
common code from serdes
    initialization method and uses it to parse the column name, type and 
comments from the table
    properties. For custom storage schema reades like Avro we will have to add 
more schema readers
    as and when required

commit 2ecd4ffd1b65c29371cb4b384c43445c7c661161
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-04T18:33:02Z

    Moved the schema reader classes to a separate package

commit 6f734eb0c5a9472b239bdc91da56ec0efe7ceef8
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-04T19:18:03Z

    Integrates the avro schema reader into the DefaultStorageaSchemaReader

commit d93ce33559036020facd12ec5ea258223eb05c4c
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-05T01:40:36Z

    Moved HiveChar, HiveVarChar, HiveCharWritable, HiveVarcharWritable to 
storage-api
    
    The PrimitiveEntry caches (static maps) which keep a mapping between the 
typename to the classes needs to be
    populated before using in standalone-metastore. This is done in 
StorageSchemaUtils static initialization block.
    However, this initialization needs access to HiveChar, HiveVarchar and 
Timestamp related classes from serde.
    
    I see that HiveDecimal class is in storage-api. HiveChar, HiveVarChar and 
their base classes are comparitively
    light weight and can be moved to storage-api. However, Timestamp related 
primitive types like TimestampWritable,
    TimestampLocalTZWritable, TimestampTZ, HiveIntervalYearMonth, 
HiveIntervalYearMonthWritable, HiveIntervalDayTimeWritable
    classes cannot be moved in a straight-forward way because they have 
dependency with Java 8 and some util methods and
    classes in LazyBinaryUtils. I have marked this as a TODO currently. As far 
as I can tell TypeInfoParser does not
    need these classes while parsing type string to deserialize the types.
    
    Note: HiveIntervalDayTime is already in storage-api. Not sure why the other 
time related types are not in storage-api
    in the first place.

commit 9bfdb9beb48088712db82b31d8bc24651dc0d474
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-01-05T02:38:28Z

    Added a test for getFields method in standalone-metastore
    
    Adds a test case. Also fixes a bug in AvroFieldSchemaGenerator

----


> Remove dependency of get_fields_with_environment_context API to serde
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17580
>                 URL: https://issues.apache.org/jira/browse/HIVE-17580
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Standalone Metastore
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>              Labels: pull-request-available
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to