[
https://issues.apache.org/jira/browse/PIG-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sarath updated PIG-4943:
------------------------
Description:
I have a script which stores 2 relations with different schema using
CSVExcelStorage.
The issue which i see is that the script picks up the last store function and
takes the schema in that and puts it for all store functions , overriding the
previous store schemas.Is this a known issue and is there a fix for this ?
My Sample Script Looks like this :--
=============================================================
masterInput = load 'hbase://xyz' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'f:a,f:b,f:c,f:d')
as (a,b,c,d);
input2 = foreach masterInput
generate
a,b;
input3 = foreach masterInput
generate
c,d;
store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
'UNIX', 'WRITE_OUTPUT_HEADER');
store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
'UNIX', 'WRITE_OUTPUT_HEADER');
=============================================================
Expected Output :
||file 1||file1||file 2||file2
|a|b|c|d
|10|20|30|40
Actual Output :
||file 1||file1||file 2||file2
|c|d|c|d
|10|20|30|40
was:
I have a script which stores 2 relations with different schema using
CSVExcelStorage.
The issue which i see is that the script picks up the last store function and
takes the schema in that and puts it for all store functions , overriding the
previous store schemas.Is this a known issue and is there a fix for this ?
My Sample Script Looks like this :--
=============================================================
masterInput = load 'hbase://xyz' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'f:a,f:b,f:c,f:d')
as (a,b,c,d);
input2 = foreach masterInput
generate
a,b;
input3 = foreach masterInput
generate
c,d;
store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
'UNIX', 'WRITE_OUTPUT_HEADER');
store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
'UNIX', 'WRITE_OUTPUT_HEADER');
=============================================================
Expected Output :
||file 1||file1||file 2||file2
|a|b||c|d
|10|20|30|40
Actual Output :
||file 1||file1||file 2||file2
|c|d||c|d
|10|20|30|40
> Schema issue while storing multiple pig outputs using CSVExcelStorage
> ---------------------------------------------------------------------
>
> Key: PIG-4943
> URL: https://issues.apache.org/jira/browse/PIG-4943
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.14.0
> Reporter: sarath
> Priority: Minor
>
> I have a script which stores 2 relations with different schema using
> CSVExcelStorage.
> The issue which i see is that the script picks up the last store function and
> takes the schema in that and puts it for all store functions , overriding the
> previous store schemas.Is this a known issue and is there a fix for this ?
> My Sample Script Looks like this :--
> =============================================================
> masterInput = load 'hbase://xyz' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> 'f:a,f:b,f:c,f:d')
> as (a,b,c,d);
> input2 = foreach masterInput
> generate
> a,b;
> input3 = foreach masterInput
> generate
> c,d;
> store input2 into '/dir/ab'
> using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
> 'UNIX', 'WRITE_OUTPUT_HEADER');
> store input3 into '/dir/cd'
> using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
> 'UNIX', 'WRITE_OUTPUT_HEADER');
> =============================================================
> Expected Output :
> ||file 1||file1||file 2||file2
> |a|b|c|d
> |10|20|30|40
> Actual Output :
> ||file 1||file1||file 2||file2
> |c|d|c|d
> |10|20|30|40
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)