[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

Jie Li (JIRA) Wed, 23 May 2012 19:11:44 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282128#comment-13282128
 ]


Jie Li commented on PIG-2691:
-----------------------------

As there was no documentation on the field schema of TOKENIZE, can we assume 
that if users want to use the field name, she would explicitly name it by AS? 
If so, then this change wouldn't break the script.
                
> Duplicate TOKENIZE schema
> -------------------------
>
>                 Key: PIG-2691
>                 URL: https://issues.apache.org/jira/browse/PIG-2691
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Jie Li
>              Labels: simple
>         Attachments: PIG-2691.patch, PIG-2691.patch.2
>
>
> TOKENIZE produces a fixed named schema that results in duplicates if used 
> more than once in the same generate statement.
> We could paramenterize the schema on the name of the field being tokenized.
> {code}
> grunt> q = LOAD 'file' AS (source:chararray, target:chararray);
> grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target);
> 2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: 
> <line 2, column 14> Duplicate schema alias: bag_of_tokenTuples
> grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities, 
> TOKENIZE(target) as t_entities;
> grunt> describe e
> e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities: 
> {tuple_of_tokens: (token: chararray)}}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

Reply via email to