[ 
https://issues.apache.org/jira/browse/PIG-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015546#comment-13015546
 ] 

Mridul Muralidharan commented on PIG-1693:
------------------------------------------

This is a great feature addition.
Hopefully, the mess created by forcefully projecting only the fields referenced 
in the schema/schema(when there is no schema specified) can be allevated 
without needing dummy schema with 10+ fields at times (atleast, it will make it 
easier I hope) !


Just curious about one aspect.
If you do something like :

A = LOAD '<path>' USING MyLoader();
B = FOREACH A $0, $3..;
STORE B USING MyStore();

Do we still need a schema to 'con' pig into projecting all the fields ? This is 
particularly relevant when the number of fields is high (or might be 'fuzzy' at 
times.)
An earlier version of pig (still ?), introduced an implicit project which 
forced projection of only the referenced fields (in case the schema not 
specified) or strictly adhere to specified schema - dropping rest of the fields 
from tuple.

Atleast with this change, I hope, we can do something like this to alleviate 
the issue :

A = LOAD '<path>' USING MyLoader();
B = FOREACH A $0, $3..$64;
STORE B USING MyStore();


Thanks for clarifying.

> support project-range expression. (was: There needs to be a way in foreach to 
> indicate "and all the rest of the fields" )
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1693
>                 URL: https://issues.apache.org/jira/browse/PIG-1693
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1693.1.patch, PIG-1693.2.patch
>
>
> A common use case we see in Pig is people have many columns in their data and 
> they only want to operate on a few of them.  Consider for example if before 
> storing data with ten columns, the user wants to perform a cast on one column:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, secondcol, thridcol, forthcol, 
> fifthcol, sixthcol, seventhcol, eigthcol, ninethcol, tenthcol;
> store Z into 'output';
> {code}
> Obviously this only gets worse as the user has more columns.  Ideally the 
> above could be transformed to something like:
> {code}
> ...
> Z = foreach Y generate (int)firstcol, "and all the rest";
> store Z into 'output'
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to