[jira] (HAWQ-1303) Load each partition as separate table for heterogenous tables in HCatalog

Oleksandr Diachenko (JIRA) Tue, 31 Jan 2017 12:32:36 -0800

     [ 
https://issues.apache.org/jira/browse/HAWQ-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oleksandr Diachenko updated HAWQ-1303:
--------------------------------------
    Description: 
Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive 
tables. But there is a limitation when HAWQ loads Hive tables into memory, it 
loads them as one table even if a table has multiple partitions with different 
output formats(GPDBWritable, TEXT). Thus currently it uses GBDBWritable format 
for that case. The idea is to load each partition set of one output format as a 
separate table, so not optimal profile, but optimal output format could be used.

Example: 
We have Hive table with four partitions of following formats - Text, RC, ORC, 
Sequence file.
Currently, HAWQ will load it to memory with GPDBWritable format.
GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for 
HIveText and HiveRC profiles.

With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable 
formats and use following pairs to read partitions - HiveText/TEXT, 
HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable.


> Load each partition as separate table for heterogenous tables in HCatalog
> -------------------------------------------------------------------------
>
>                 Key: HAWQ-1303
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1303
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Hcatalog, PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Ed Espino
>
> Changes introduced in HAWQ-1228 made HAWQ use optimal profile/format for Hive 
> tables. But there is a limitation when HAWQ loads Hive tables into memory, it 
> loads them as one table even if a table has multiple partitions with 
> different output formats(GPDBWritable, TEXT). Thus currently it uses 
> GBDBWritable format for that case. The idea is to load each partition set of 
> one output format as a separate table, so not optimal profile, but optimal 
> output format could be used.
> Example: 
> We have Hive table with four partitions of following formats - Text, RC, ORC, 
> Sequence file.
> Currently, HAWQ will load it to memory with GPDBWritable format.
> GPDBWritable format is optimal for HiveORC, Hive profiles but not optimal for 
> HIveText and HiveRC profiles.
> With proposed changes, HAWQ should load two tables with TEXT and GPDBWritable 
> formats and use following pairs to read partitions - HiveText/TEXT, 
> HiveRC/TEXT, HiveORC/GPDBWritable, Hive/GPDBWritable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (HAWQ-1303) Load each partition as separate table for heterogenous tables in HCatalog

Reply via email to