[jira] (HAWQ-1228) Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles)

Oleksandr Diachenko (JIRA) Tue, 31 Jan 2017 12:17:08 -0800

     [ 
https://issues.apache.org/jira/browse/HAWQ-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oleksandr Diachenko updated HAWQ-1228:
--------------------------------------
    Description: 
To leverage changes introduced in HAWQ-1177, expand optimization for other Hive 
profiles. Additional information needs to be included in user metadata(e.g. 
DELIMITER etc).

Changes needed:
* Enhance the Metadata API, to add new attributes: outputFormats, 
outputParameters;
* Hive, HiveORC profiles should support just GPDBWritable format;
* HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
* Unify HiveUserData data structures to be same among all Hive- profiles;
* Bridge should read fragments using optimal profile read from fragment 
information;
* Optimal profile should be determined based on file's input 
format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, 
org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, 
org.apache.hadoop.mapred.TextInputFormat - HiveText);
* Default profile is Hive;
* If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has some 
comlex types - Hive profile should be used();
* If table is homogeneous(all input file have the same output format) Bridge 
uses the same format which table has. Otherwise, if table is heterogeneous, 
GPDBWritable should be used;

  was:
To leverage changes introduced in HAWQ-1177, expand optimization for other Hive 
profiles. Additional information needs to be included in user metadata(e.g. 
DELIMITER etc).
The change should support homogeneous tables as for now. Homogeneous table in 
this context means table which has no partitions, or all partitions span same 
storage format. For heterogeneous tables HAWQ should still use Hive profile.


> Use profile based on file format in HCatalog integration(HiveRC, HiveText 
> profiles)
> -----------------------------------------------------------------------------------
>
>                 Key: HAWQ-1228
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1228
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>
> To leverage changes introduced in HAWQ-1177, expand optimization for other 
> Hive profiles. Additional information needs to be included in user 
> metadata(e.g. DELIMITER etc).
> Changes needed:
> * Enhance the Metadata API, to add new attributes: outputFormats, 
> outputParameters;
> * Hive, HiveORC profiles should support just GPDBWritable format;
> * HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
> * Unify HiveUserData data structures to be same among all Hive- profiles;
> * Bridge should read fragments using optimal profile read from fragment 
> information;
> * Optimal profile should be determined based on file's input 
> format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, 
> org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, 
> org.apache.hadoop.mapred.TextInputFormat - HiveText);
> * Default profile is Hive;
> * If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has 
> some comlex types - Hive profile should be used();
> * If table is homogeneous(all input file have the same output format) Bridge 
> uses the same format which table has. Otherwise, if table is heterogeneous, 
> GPDBWritable should be used;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (HAWQ-1228) Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles)

Reply via email to