[ 
https://issues.apache.org/jira/browse/HIVE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249975#comment-16249975
 ] 

Alan Gates commented on HIVE-17714:
-----------------------------------

[~sershe] are you suggesting that all calls to the metastore should rely on 
parsing the schema from the SerDe rather than looking up the column list in the 
metadata?  I would not be in favor that.  That's going to slow down the 
metastore access times and make the code much more complicated.  If you are 
concerned about correctness, it is better to call the SerDe during data write 
time and confirm that the columns written match with the columns specified in 
the metadata (idea credit to [~owen.omalley]).

[~vihangk1]  I propose a couple of modifications to your proposal:

Item 2, we move Serializer, Deserializer, AbstractSerDe [and I suspect TypeInfo 
and ObjectInspector will have to come too] to a *new* module in storage-api.  
This avoids the need for ORC and any other storage format to pick it up.  I 
agree that serde implementations should not become part of the storage-api 
because they are still undergoing lots of development, and that will make the 
release cycle harder in Hive.  Serializer et al APIs are not changing much and 
thus moving them to the storage-api will have a minimal cost for Hive.

I also propose we add a new item 5:  Inside Hive, we work to move all of the 
SerDe implementations from exec to serde module.  We do not change what 
packages the classes are in, just move them into the existing serde module.  
This will result in a single module that the metastore (and anyone else who 
wants to use Hive serdes) can use without having to pick up all of Hive.  The 
standalone metastore still shouldn't directly depend on this serde module (that 
would make a mess of our release process) but users could easily pull it in at 
runtime.  

> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>
>                 Key: HIVE-17714
>                 URL: https://issues.apache.org/jira/browse/HIVE-17714
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Alan Gates
>
> Columns in metastore for tables that use external schema don't have the type 
> information (since HIVE-11985) and may be entirely inconsistent (since 
> forever, due to issues like HIVE-17713; or for SerDes that allow an URL for 
> the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, 
> and to MetaStoreUtils.getFieldsFromDeserializer, you'd see that the code in 
> QL handles this in Hive. So, for the most part metastore just returns 
> whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is 
> interesting... so getTable will return incorrect columns (potentially), but 
> get_fields/get_schema will return correct ones from SerDe as far as I can 
> tell.
> As part of separating the metastore, we should make sure all the APIs return 
> the correct schema for the columns; it's not a good idea to have everyone 
> reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to