[ 
https://issues.apache.org/jira/browse/HCATALOG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HCATALOG-49:
------------------------------

    Attachment: HCATALOG-49.patch

Here is an initial attempt to support Avro in HCatalog.

Some notes:

* For output, an Avro schema is computed for the HCatalog schema by the Avro 
output storage driver. The current patch does not allow you to specify a custom 
Avro schema - this would be a natural extension.
* Avro map keys must be strings, wheres they can be any type in HCatalog. The 
current implementation assumes that HCatalog maps have string types, and fails 
if this is not true. It might be possible to relax this restriction in the 
future by doing type conversion. 
* In HCatalog, values can be null, whereas this is not true for simple schemas 
in Avro. It would be possible to generate null unions in Avro, but this isn't 
done here. This could be a future enhancement.
* For the Avro input storage driver, the Avro schema in the Avro Data File is 
checked for compatibility with the HCatalog schema, and an exception is thrown 
if there's a mismatch.
* Byte arrays can not be represented in HCatalog, so there is no way to read 
byte arrays from Avro files. (Pig has the same limitation.)



> Support Avro Data File Format in HCatalog
> -----------------------------------------
>
>                 Key: HCATALOG-49
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-49
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Tom White
>         Attachments: HCATALOG-49.patch
>
>
> Add input and output drivers for Avro.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to