[
https://issues.apache.org/jira/browse/HAWQ-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oleksandr Diachenko resolved HAWQ-1228.
---------------------------------------
Resolution: Fixed
> Use profile based on file format in HCatalog integration(HiveRC, HiveText
> profiles)
> -----------------------------------------------------------------------------------
>
> Key: HAWQ-1228
> URL: https://issues.apache.org/jira/browse/HAWQ-1228
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: PXF
> Reporter: Oleksandr Diachenko
> Assignee: Oleksandr Diachenko
> Fix For: 2.1.0.0-incubating
>
>
> To leverage changes introduced in HAWQ-1177, expand optimization for other
> Hive profiles. Additional information needs to be included in user
> metadata(e.g. DELIMITER etc).
> Changes needed:
> * Enhance the Metadata API, to add new attributes: outputFormats,
> outputParameters;
> * Hive, HiveORC profiles should support just GPDBWritable format;
> * HIveText, HiveRC profiles should support both TEXT and GPDBWritable formats;
> * Unify HiveUserData data structures to be same among all Hive- profiles;
> * Bridge should read fragments using optimal profile read from fragment
> information;
> * Optimal profile should be determined based on file's input
> format(org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - HiveORC,
> org.apache.hadoop.hive.ql.io.RCFileInputFormat - HiveRC,
> org.apache.hadoop.mapred.TextInputFormat - HiveText);
> * Default profile is Hive;
> * If Hive table has org.apache.hadoop.mapred.TextInputFormat but also has
> some comlex types - Hive profile should be used(limitation should be
> addressed in HAWQ-1265);
> * If table is homogeneous(all input file have the same output format) Bridge
> uses the same format which table has. Otherwise, if table is heterogeneous,
> GPDBWritable should be used;
> * Add new property outputFormat to pxf-profiles-default.xml, which means
> default output format of profile.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)