[
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249975#comment-16249975
]
Alan Gates commented on HIVE-17714:
-----------------------------------
[~sershe] are you suggesting that all calls to the metastore should rely on
parsing the schema from the SerDe rather than looking up the column list in the
metadata? I would not be in favor that. That's going to slow down the
metastore access times and make the code much more complicated. If you are
concerned about correctness, it is better to call the SerDe during data write
time and confirm that the columns written match with the columns specified in
the metadata (idea credit to [~owen.omalley]).
[~vihangk1] I propose a couple of modifications to your proposal:
Item 2, we move Serializer, Deserializer, AbstractSerDe [and I suspect TypeInfo
and ObjectInspector will have to come too] to a *new* module in storage-api.
This avoids the need for ORC and any other storage format to pick it up. I
agree that serde implementations should not become part of the storage-api
because they are still undergoing lots of development, and that will make the
release cycle harder in Hive. Serializer et al APIs are not changing much and
thus moving them to the storage-api will have a minimal cost for Hive.
I also propose we add a new item 5: Inside Hive, we work to move all of the
SerDe implementations from exec to serde module. We do not change what
packages the classes are in, just move them into the existing serde module.
This will result in a single module that the metastore (and anyone else who
wants to use Hive serdes) can use without having to pick up all of Hive. The
standalone metastore still shouldn't directly depend on this serde module (that
would make a mess of our release process) but users could easily pull it in at
runtime.
> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
> Key: HIVE-17714
> URL: https://issues.apache.org/jira/browse/HIVE-17714
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type
> information (since HIVE-11985) and may be entirely inconsistent (since
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA,
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in
> QL handles this in Hive. So, for the most part metastore just returns
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is
> interesting... so getTable will return incorrect columns (potentially), but
> get_fields/get_schema will return correct ones from SerDe as far as I can
> tell.
> As part of separating the metastore, we should make sure all the APIs return
> the correct schema for the columns; it's not a good idea to have everyone
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)