[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251806#comment-16251806
 ] 

Owen O'Malley commented on HIVE-17714:
--------------------------------------

I'm also -1 to the metastore using the Serdes to recreate the table schema. The 
Avro serde is particularly bad in this regard because it can use an external 
file to store the schema. Thus, the schema of the table can change without 
notifying the metastore. That is pretty broken. Does anyone know what the 
original goal of 
that capability was?

I think the long term goal should be to make "load data" should determine if 
the type is self-describing and invoke an interface to determine the types of 
the loaded data.

For managed tables, the metastore needs to know the types of the tables. The 
goal should be to remove the functions that allow users to update the data 
directly without going through Hive. The metastore needs to know the types and 
have relevant statistics. That is the only way the optimizer has a chance of 
figuring out the proper plan.

> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
>                 Key: HIVE-17714
>                 URL: https://issues.apache.org/jira/browse/HIVE-17714
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to