[
https://issues.apache.org/jira/browse/PIG-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Indyk updated PIG-4705:
---------------------------------
Description:
When we use {{HCatalog}} as source and destination of data for {{Pig}} on
{{Tez}} we get ??ERROR 1115: Schema for data cannot be determined??.
Pig works fine when we use map reduce or use HCatalog only as one of endpoints
i.e. load data directly from file and store using HCatalog.
The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig
0.15}} on {{Tez 0.7.0}} ( {{HDP 2.2.6}} to {{HDP 2.3.2}}).
To reproduce:
- create hive tables from [^hive_tables.hql]
- load data to table_input from [^sample.csv]
- run following Pig script on Tez
{code}
data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
items_unique = DISTINCT data;
counted = FOREACH (GROUP items_unique BY col2)
GENERATE
group AS name,
COUNT(items_unique) AS value;
STORE counted INTO 'table_output' USING
org.apache.hive.hcatalog.pig.HCatStorer();
{code}
was:
When we use {{HCatalog}} as source and destination of data for {{Pig}} on
{{Tez}} we get ??ERROR 1115: Schema for data cannot be determined??.
Pig works fine when we use map reduce or use HCatalog only as one of endpoints
i.e. load data directly from file and store using HCatalog.
The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig
0.15}} on {{Tez 0.7.0}} ( HDP 2.2.6}} to {{HDP 2.3.2}}).
To reproduce:
- create hive tables from [^hive_tables.hql]
- load data to table_input from [^sample.csv]
- run following Pig script on Tez
{code}
data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
items_unique = DISTINCT data;
counted = FOREACH (GROUP items_unique BY col2)
GENERATE
group AS name,
COUNT(items_unique) AS value;
STORE counted INTO 'table_output' USING
org.apache.hive.hcatalog.pig.HCatStorer();
{code}
> Error Schema for data cannot be determined using HCatalog
> ---------------------------------------------------------
>
> Key: PIG-4705
> URL: https://issues.apache.org/jira/browse/PIG-4705
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.15.0
> Environment: HDP 2.3.2
> Reporter: Krzysztof Indyk
> Attachments: hive_tables.hql, sample.csv, stack_trace.log
>
>
> When we use {{HCatalog}} as source and destination of data for {{Pig}} on
> {{Tez}} we get ??ERROR 1115: Schema for data cannot be determined??.
> Pig works fine when we use map reduce or use HCatalog only as one of
> endpoints i.e. load data directly from file and store using HCatalog.
> The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig
> 0.15}} on {{Tez 0.7.0}} ( {{HDP 2.2.6}} to {{HDP 2.3.2}}).
> To reproduce:
> - create hive tables from [^hive_tables.hql]
> - load data to table_input from [^sample.csv]
> - run following Pig script on Tez
> {code}
> data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
> items_unique = DISTINCT data;
> counted = FOREACH (GROUP items_unique BY col2)
> GENERATE
> group AS name,
> COUNT(items_unique) AS value;
>
> STORE counted INTO 'table_output' USING
> org.apache.hive.hcatalog.pig.HCatStorer();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)