[
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314309#comment-16314309
]
ASF GitHub Bot commented on HIVE-17580:
---------------------------------------
GitHub user vihangk1 opened a pull request:
https://github.com/apache/hive/pull/287
HIVE-17580 : Remove standalone-metastore's dependency with serdes
Removing the dependency on serdes for the metastore requires a series of
changes. I have created multiple commits which hopefully would be easier to
review. Each major commit has a descriptive commit message to give a high level
idea of what the change is doing. There are still some bits which need to be
completed but it would be good to a review.
Overview of all the changes done:
1. Creates a new module called serde-api under storage-api like discussed.
Although I think we can keep it separate as well.
2. Moved List, Map, Struct, Constant, Primitive, Union ObjectInspectors to
serde-api
3. Moved PrimitiveTypeInfo, PrimitiveTypeEntry and TypeInfo to serde-api.
4. Moved TypeInfoParser, TypeInfoFactory to serde-api
5. Added a new class which reading avro storage schema by copying the code
from AvroSerde and AvroSerdeUtils. The parsing is done such that String value
is first converted into TypeInfos and then into FieldSchemas bypassing the need
for ObjectInspectors. In theory we could get rid of TypeInfos as well but that
path was getting too difficult with lot of duplicate code between Hive and
metastore.
6. Introduces a default storage schema reader. I noticed that most of the
serdes use the same logic to parse the metadata information. This code should
be refactored to a common place instead of having many copies (one in
standalone hms and another set in multiple serdes)
7. Moved HiveChar, HiveVarchar, HiveCharWritable, HiveVarcharWritable to
storage-api. I noticed that HiveDecimal is already in storage-api. It probably
makes sense to move the other primitive types (timestamp, interval etc)to
storage-api as well but it requires storage-api to be upgraded to Java 8.
8. Adds a basic test for the schema reader. I plan to add more tests as
this code is reviewed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vihangk1/hive vihangk1_HIVE-17580
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hive/pull/287.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #287
----
commit bbfb7dc44904db74a840167c02b07f50a6010b69
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-11-09T00:52:39Z
HIVE-17580 : Remove dependency of get_fields_with_environment_context API
to serde
commit d54879845eff10c19bc17bda9e09dda16f6fa295
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-11-29T00:54:23Z
Moved List, Map, Struct OI to storage-api
commit a12d6c7ba3de598c6b6f75da1bd4efcac43036b1
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-11-29T04:19:39Z
Moved ConstantObjectInspector PrimitiveObjectInspector and
UnionObjectInspector
commit 13fb832fc2d51958e75d5e609f6781f87449aed8
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-12-28T01:25:59Z
Moved PrimitiveTypeInfo to serde-api
In order to move PrimitiveTypeInfo we need to move the PrimitiveTypeEntry
as well.
PrimitiveTypeEntry depends on PrimitiveObjectInspectorUtils which cannot be
pulled
into serde-api. Hence the static final maps are moved to PrimitiveEntry and
we provide
static access methods to these maps along with the register method to add
the key
value pairs in the maps
commit 9cbc789fd3f4ced7ce66a7313c451b75a154976f
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-12-28T20:51:39Z
Moved the other TypeInfos to serde-api
In order to move the other TypeInfo classes to serde-api we need to move
the serdeConstants.java
as well. This is a thrift generated class. This commit copies the
serde.thrift instead of moving.
The only reason I did not move it is in case of backwards compatibility
reasons (in case someone is using the thrift file location to do something).
If it is okay to move serde.thrift from serde module to serde-api we can
delete it from serde module in a separate
change.
The other concern is there are some TypeInfo classes which do some
validation like VarCharTypeInfo, DecimalTypeInfo.
The validating methods use the actual type implementation like HiveChar,
HiveDecimal etc to ensure that the params
are under the correct limits. This creates a problem since we cannot bring
in the type implementations as well to
serde-api. Currently, I have marked these as TODO and commented them out.
commit 23aa899a90d17648e560d39602a3bd29bf53661e
Author: Vihang Karajgaonkar <vihang@...>
Date: 2017-12-29T22:14:06Z
Moved TypeInfoParser and TypeInfoFactory to serde-api
In order to use TypeInfoParser in standalone metastore, we should move it
to serde-api
There is a problem in moving the TypeInfoParser to serde-api which is there
are some validation util
methods which validated the max lengths of varchar, char or decimal
typeinfo's precision and scale.
These values are originally defined in HiveChar, HiveVarChar and
HiveDecimal which cannot be moved to
serde-api.
In order to fix this this change moved these constants to serde.thrift so
that it can be accessed from
serdeConstants instead of HiveChar or HiveVarChar (not sure if this is the
right thing to do, but there
doesn't seem any good option here)
commit 0b29dc43e78eaa33e0cb672fb9cdabfb5b02d534
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-03T19:45:32Z
Add avro storeage schema reader
This commit adds a AvroStorageSchemaReader which reads the Avro schema
files both for external schema and regular avro tables.
Most of the util methods are in AvroSchemaUtils class which has methods
copied from AvroSerDeUtils. Some of the needed classes like
SchemaResolutionProblem, InstanceCache, SchemaToTypeInfo, TypeInfoToSchema
are also copied from Hive. The constants defined
in AvroSerde are copied in AvroSerdeConstants. The class
AvroFieldSchemaGenerator converts the AvroSchema into List of
FieldSchema which is returned by the AvroStorageSchemaReader
commit 5f2b174cd4e038e9871348dd4cce5389f0f75776
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-04T01:02:40Z
Introduce default storage schema reader
This change introduces a default storage schema reader which copies the
common code from serdes
initialization method and uses it to parse the column name, type and
comments from the table
properties. For custom storage schema reades like Avro we will have to add
more schema readers
as and when required
commit 2ecd4ffd1b65c29371cb4b384c43445c7c661161
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-04T18:33:02Z
Moved the schema reader classes to a separate package
commit 6f734eb0c5a9472b239bdc91da56ec0efe7ceef8
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-04T19:18:03Z
Integrates the avro schema reader into the DefaultStorageaSchemaReader
commit d93ce33559036020facd12ec5ea258223eb05c4c
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-05T01:40:36Z
Moved HiveChar, HiveVarChar, HiveCharWritable, HiveVarcharWritable to
storage-api
The PrimitiveEntry caches (static maps) which keep a mapping between the
typename to the classes needs to be
populated before using in standalone-metastore. This is done in
StorageSchemaUtils static initialization block.
However, this initialization needs access to HiveChar, HiveVarchar and
Timestamp related classes from serde.
I see that HiveDecimal class is in storage-api. HiveChar, HiveVarChar and
their base classes are comparitively
light weight and can be moved to storage-api. However, Timestamp related
primitive types like TimestampWritable,
TimestampLocalTZWritable, TimestampTZ, HiveIntervalYearMonth,
HiveIntervalYearMonthWritable, HiveIntervalDayTimeWritable
classes cannot be moved in a straight-forward way because they have
dependency with Java 8 and some util methods and
classes in LazyBinaryUtils. I have marked this as a TODO currently. As far
as I can tell TypeInfoParser does not
need these classes while parsing type string to deserialize the types.
Note: HiveIntervalDayTime is already in storage-api. Not sure why the other
time related types are not in storage-api
in the first place.
commit 9bfdb9beb48088712db82b31d8bc24651dc0d474
Author: Vihang Karajgaonkar <vihang@...>
Date: 2018-01-05T02:38:28Z
Added a test for getFields method in standalone-metastore
Adds a test case. Also fixes a bug in AvroFieldSchemaGenerator
----
> Remove dependency of get_fields_with_environment_context API to serde
> ---------------------------------------------------------------------
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
> Issue Type: Sub-task
> Components: Standalone Metastore
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Labels: pull-request-available
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}}
> class to access the fields metadata for the cases where it is stored along
> with the data files (avro tables). The problem is Deserializer classes is
> defined in hive-serde module and in order to make metastore independent of
> Hive we will have to remove this dependency (atleast we should change it to
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this
> functionality.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)