[ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5298:
------------------------------

    Attachment: HIVE-5298.patch

Initial patch. Running tests. Will submit patch if tests pass.
                
> AvroSerde performance problem caused by HIVE-3833
> -------------------------------------------------
>
>                 Key: HIVE-5298
>                 URL: https://issues.apache.org/jira/browse/HIVE-5298
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5298.patch
>
>
> HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
> metadata to initialize object inspector. In doing that, however, it goes thru 
> every file under the table to access the partition metadata, which is very 
> inefficient, especially in case of multiple files per partition. This causes 
> more problem for AvroSerde because AvroSerde initialization accesses schema, 
> which is located on file system. As a result, before hive can process any 
> data, it needs to access every file for a table, which can take long enough 
> to cause job failure because of lack of job progress.
> The improvement can be made so that partition metadata is only access once 
> per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to