[ 
https://issues.apache.org/jira/browse/HCATALOG-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Crawford updated HCATALOG-443:
-------------------------------------

    Attachment: HCATALOG-443_api_to_metadata_deserializer.2.patch

This patch version adds a few more changes necessary to get our data working 
with HCat trunk.

Summarizing the patch as a whole:

* Switch to "org.apache.hadoop.hive.ql.metadata" classes except when 
serializing. These versions add additional business logic. Specifically I care 
about dynamically reported schemas, but there's some other logic too.

* Add support for deserializer-only read path. For example, ThriftDeserializer 
does not work with the current HCatRecordReader because it does not implement 
Serializer.

* Bugfix to actually use SerDeInfo properties to initialize the deserializer.

* Allow users to get string values of enum fields. By default you still get 
struct<value:int> but if you specify "string" as the field type enums will be 
returned as strings. This is necessary for integration in our environment as 
Elephant-Bird behaves this way.

* Small updates to how binary fields are handled based on Hive changes.

* Add TestHCatHiveThriftCompatibility test to ensure HCat works with 
serde-reported schemas.

* Update some tests to use HCatBaseTest (and junit 4 style) so they run in my 
IDE, which was needed to debug them.
                
> Use "metadata" Table/Partition classes, and Deserializer when reading
> ---------------------------------------------------------------------
>
>                 Key: HCATALOG-443
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-443
>             Project: HCatalog
>          Issue Type: Bug
>            Reporter: Travis Crawford
>            Assignee: Travis Crawford
>         Attachments: HCATALOG-443_api_to_metadata_deserializer.1.patch, 
> HCATALOG-443_api_to_metadata_deserializer.2.patch
>
>
> This issue is related to HIVE-2950.
> When HCatalog queries the HiveMetaStore it gets back classes in the 
> "org.apache.hadoop.hive.metastore.api" package. This represents exactly what 
> is stored in the metastore database.
> Hive has companion classes in "org.apache.hadoop.hive.ql.metadata" that 
> provide some logic on top of what's stored in the actual database. For 
> example:
> * org.apache.hadoop.hive.metastore.api.Table.getCols shows columns explicitly 
> stored in the database
> * org.apache.hadoop.hive.ql.metadata.Table.getCols shows columns reported by 
> the serde if there are any.
> Except when serializing stuff into the job configuration HCatalog should use 
> the "metadata" version of these classes so that the additional logic is 
> called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to