[
https://issues.apache.org/jira/browse/PIG-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-4705:
----------------------------
Fix Version/s: 0.16.0
> Error Schema for data cannot be determined using HCatalog
> ---------------------------------------------------------
>
> Key: PIG-4705
> URL: https://issues.apache.org/jira/browse/PIG-4705
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.15.0
> Environment: HDP 2.3.2
> Reporter: Krzysztof Indyk
> Fix For: 0.16.0
>
> Attachments: hive_tables.hql, sample.csv, stack_trace.log
>
>
> When we use {{HCatalog}} as source and destination of data for {{Pig}} on
> {{Tez}} we get ??ERROR 1115: Schema for data cannot be determined??.
> Pig works fine when we use map reduce or use HCatalog only as one of
> endpoints i.e. load data directly from file and store using HCatalog.
> The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig
> 0.15}} on {{Tez 0.7.0}} ( {{HDP 2.2.6}} to {{HDP 2.3.2}}).
> To reproduce:
> - create hive tables from [^hive_tables.hql]
> - load data to table_input from [^sample.csv]
> - run following Pig script on Tez
> {code}
> data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
> items_unique = DISTINCT data;
> counted = FOREACH (GROUP items_unique BY col2)
> GENERATE
> group AS name,
> COUNT(items_unique) AS value;
>
> STORE counted INTO 'table_output' USING
> org.apache.hive.hcatalog.pig.HCatStorer();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)