[jira] (HAWQ-1228) Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles)

Oleksandr Diachenko (JIRA) Tue, 31 Jan 2017 12:18:06 -0800

     [ 
https://issues.apache.org/jira/browse/HAWQ-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oleksandr Diachenko updated HAWQ-1228:
--------------------------------------
    Description: 
To leverage changes introduced in HAWQ-1177, expand optimization for other Hive 
profiles. Additional information needs to be included in user metadata(e.g. 
DELIMITER etc).

Changes needed:
* Enhance the Metadata API, to add new attributes: outputFormats, 
outputParameters;
* Hive, HiveORC profiles should support just GPDBWritable format;
* HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
* Unify HiveUserData data structures to be same among all Hive- profiles;
* Bridge should read fragments using optimal profile read from fragment 
information;
* Optimal profile should be determined based on file's input 
format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, 
org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, 
org.apache.hadoop.mapred.TextInputFormat - HiveText);
* Default profile is Hive;
* If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has some 
comlex types - Hive profile should be used(limitation should be addressed in 
HAWQ-1265);
* If table is homogeneous(all input file have the same output format) Bridge 
uses the same format which table has. Otherwise, if table is heterogeneous, 
GPDBWritable should be used;

  was:
To leverage changes introduced in HAWQ-1177, expand optimization for other Hive 
profiles. Additional information needs to be included in user metadata(e.g. 
DELIMITER etc).

Changes needed:
* Enhance the Metadata API, to add new attributes: outputFormats, 
outputParameters;
* Hive, HiveORC profiles should support just GPDBWritable format;
* HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
* Unify HiveUserData data structures to be same among all Hive- profiles;
* Bridge should read fragments using optimal profile read from fragment 
information;
* Optimal profile should be determined based on file's input 
format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, 
org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, 
org.apache.hadoop.mapred.TextInputFormat - HiveText);
* Default profile is Hive;
* If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has some 
comlex types - Hive profile should be used();
* If table is homogeneous(all input file have the same output format) Bridge 
uses the same format which table has. Otherwise, if table is heterogeneous, 
GPDBWritable should be used;


> Use profile based on file format in HCatalog integration(HiveRC, HiveText 
> profiles)
> -----------------------------------------------------------------------------------
>
>                 Key: HAWQ-1228
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1228
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Oleksandr Diachenko
>            Assignee: Oleksandr Diachenko
>
> To leverage changes introduced in HAWQ-1177, expand optimization for other 
> Hive profiles. Additional information needs to be included in user 
> metadata(e.g. DELIMITER etc).
> Changes needed:
> * Enhance the Metadata API, to add new attributes: outputFormats, 
> outputParameters;
> * Hive, HiveORC profiles should support just GPDBWritable format;
> * HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
> * Unify HiveUserData data structures to be same among all Hive- profiles;
> * Bridge should read fragments using optimal profile read from fragment 
> information;
> * Optimal profile should be determined based on file's input 
> format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC, 
> org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC, 
> org.apache.hadoop.mapred.TextInputFormat - HiveText);
> * Default profile is Hive;
> * If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has 
> some comlex types - Hive profile should be used(limitation should be 
> addressed in HAWQ-1265);
> * If table is homogeneous(all input file have the same output format) Bridge 
> uses the same format which table has. Otherwise, if table is heterogeneous, 
> GPDBWritable should be used;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (HAWQ-1228) Use profile based on file format in HCatalog integration(HiveRC, HiveText profiles)

Reply via email to