[
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252656#comment-16252656
]
Vihang Karajgaonkar commented on HIVE-17714:
--------------------------------------------
bq. However, that means that most metastore APIs return bogus fields for such
tables (only get_schema/get_fields return correct fields - by calling the
deserializer inside metastore).
Can you please give an example? If the other APIs should really be using
Deserializer till now, I am surprised that nobody is hit with this issue till
now. If there is a way to reproduce this perhaps it will make it easier to
understand.
bq. screwing everyone who wants to read Hive tables without intricate
understanding of SerDe/Hive internals. I know for sure that it will break
Presto, but I suspect it will actually break everyone trying to use metastore
at this time And I'm not even sure how non-Java users can support this.
Again, if this was such a fundamental problem I don't know why anyone has not
seen this till now since only two APIs currently get the schema from serde
while rest just query from the database.
bq. forcing the table creation and updates to externally recreate the schema
for the benefit of the readers. This is not as bad as messing with readers,
cause those tables are mostly created by Hive, but still bad (if external users
do create the tables) and also doesn't solve the external schema case.
Just to clarify are you saying that anybody who is reading this table
(Hive/Impala/Presto etc) will have to recreate the schema using Deserializer if
the table schema is not stored in metastore? Isn't that happening right now
anyways? I looked at the describe table implementation in Hive and it gets the
schema from deserializer. The {{getTable}} API by default does not retrieve the
storageDescriptor and columnDescriptor currently.
> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
> Key: HIVE-17714
> URL: https://issues.apache.org/jira/browse/HIVE-17714
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type
> information (since HIVE-11985) and may be entirely inconsistent (since
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA,
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in
> QL handles this in Hive. So, for the most part metastore just returns
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is
> interesting... so getTable will return incorrect columns (potentially), but
> get_fields/get_schema will return correct ones from SerDe as far as I can
> tell.
> As part of separating the metastore, we should make sure all the APIs return
> the correct schema for the columns; it's not a good idea to have everyone
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)