[
https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gianmarco De Francisci Morales updated PIG-2691:
------------------------------------------------
Resolution: Fixed
Fix Version/s: 0.11
Release Note: TOKENIZE: the default name of the field in the schema
produced by this UDF now depends on the input field. This change could break
your script if you were relying on the field being called "bag_of_tokenTuples"
(i.e. you were not using an AS clause to rename the field).
Hadoop Flags: Incompatible change
Status: Resolved (was: Patch Available)
> Duplicate TOKENIZE schema
> -------------------------
>
> Key: PIG-2691
> URL: https://issues.apache.org/jira/browse/PIG-2691
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Assignee: Jie Li
> Labels: simple
> Fix For: 0.11
>
> Attachments: PIG-2691.patch, PIG-2691.patch.2
>
>
> TOKENIZE produces a fixed named schema that results in duplicates if used
> more than once in the same generate statement.
> We could paramenterize the schema on the name of the field being tokenized.
> {code}
> grunt> q = LOAD 'file' AS (source:chararray, target:chararray);
> grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target);
> 2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1108:
> <line 2, column 14> Duplicate schema alias: bag_of_tokenTuples
> grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities,
> TOKENIZE(target) as t_entities;
> grunt> describe e
> e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities:
> {tuple_of_tokens: (token: chararray)}}
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira