[
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252765#comment-16252765
]
Vihang Karajgaonkar commented on HIVE-17714:
--------------------------------------------
bq. metastore would duplicate the schema from deserializer into the metastore
columns (2.1 in my previous comment). So, in most cases (unless either the user
or the serde messed with it), the schema returned would actually be the real
schema.
Sounds like before HIVE-11985 all the APIs would have returned consistent
schema. Why did we change that in HIVE-11985? (traced it to [this comment |
https://issues.apache.org/jira/browse/HIVE-11985?focusedCommentId=14949665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14949665]
by [~xuefuz] on that JIRA
bq. "If we spend time on this, I'd rather solve the problem in the generic way,
regardless the serde type and db type. The obvious inconsistency I see here is
that we store for avro the schema if it's less than 2000 while storing a
constant string for anything over that. If we determine that it's not necessary
to store it for avro, don't store it at all. Or if we can solve the length
problem for all serdes, then that's probably the the right way to go."
Based on what I understand so far (please forgive me if I am repeating the
obvious) the inconsistency in the returned schema is only in the cases of
tables where the schema should be derived from deserializer because in
HIVE-11985 we decided not to store the such schemas in metastore. And the
reason why we don't store these schemas in metastore in the first places is due
to the 4000 character limit. The patch for HIVE-12274 changed the column type
to CLOB. Shouldn't the original problem which is causing all this not exist
anymore?
> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
> Key: HIVE-17714
> URL: https://issues.apache.org/jira/browse/HIVE-17714
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type
> information (since HIVE-11985) and may be entirely inconsistent (since
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA,
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in
> QL handles this in Hive. So, for the most part metastore just returns
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is
> interesting... so getTable will return incorrect columns (potentially), but
> get_fields/get_schema will return correct ones from SerDe as far as I can
> tell.
> As part of separating the metastore, we should make sure all the APIs return
> the correct schema for the columns; it's not a good idea to have everyone
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)