[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252656#comment-16252656
 ] 

Vihang Karajgaonkar commented on HIVE-17714:
--------------------------------------------

bq. However, that means that most metastore APIs return bogus fields for such 
tables (only get_schema/get_fields return correct fields - by calling the 
deserializer inside metastore).

Can you please give an example? If the other APIs should really be using 
Deserializer till now, I am surprised that nobody is hit with this issue till 
now. If there is a way to reproduce this perhaps it will make it easier to 
understand.

bq. screwing everyone who wants to read Hive tables without intricate 
understanding of SerDe/Hive internals. I know for sure that it will break 
Presto, but I suspect it will actually break everyone trying to use metastore 
at this time  And I'm not even sure how non-Java users can support this.

Again, if this was such a fundamental problem I don't know why anyone has not 
seen this till now since only two APIs currently get the schema from serde 
while rest just query from the database.

bq. forcing the table creation and updates to externally recreate the schema 
for the benefit of the readers. This is not as bad as messing with readers, 
cause those tables are mostly created by Hive, but still bad (if external users 
do create the tables) and also doesn't solve the external schema case.

Just to clarify are you saying that anybody who is reading this table 
(Hive/Impala/Presto etc) will have to recreate the schema using Deserializer if 
the table schema is not stored in metastore? Isn't that happening right now 
anyways? I looked at the describe table implementation in Hive and it gets the 
schema from deserializer. The {{getTable}} API by default does not retrieve the 
storageDescriptor and columnDescriptor currently.





> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
>                 Key: HIVE-17714
>                 URL: https://issues.apache.org/jira/browse/HIVE-17714
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to