[
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510645#comment-13510645
]
Joseph Adler commented on PIG-2684:
-----------------------------------
I'm addressing this right now in PIG-3015. This isn't a bug; it's just a
mismatch between the set of names that Avro allows and the names that Pig
allows. (As a side note, there are good reasons why only some variable names
are allowed in Avro: limiting the characters in names allows Avro to generate
code to process Avro objects in a number of different languages. Colons in
variable names would make it difficult to do this.)
First, there are two workaround for this problem right now:
- The user can rename variables before storing the bag
- The user can manually specify the output schema
Second, I don't like the idea of using namespaces for this. Namespaces are
important for specific record types in Avro; they are translated by the
protocol and schema compiles into package names for java classes.
To make AvroStorage easier to user, I think it would make sense to add an
option to AvroStorage to translate names with colons in some reasonable way:
maybe translating the double colons to double underscores.
> :: in field name causes AvroStorage to fail
> -------------------------------------------
>
> Key: PIG-2684
> URL: https://issues.apache.org/jira/browse/PIG-2684
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Reporter: Fabian Alenius
>
> There appears to be a bug in AvroStorage which causes it to fail when there
> are field names that contain ::
> For example, the following will fail:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group);
>
>
> store result into 'test.avro' using
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> ERROR 2999: Unexpected internal error. Illegal character in: group::one
> While the following will succeed:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group) as (one,two);
>
> store result into 'test.avro' using
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> Here is a minimal test case:
> data = load 'test.txt' as (one::two, three);
>
>
> store data into 'test.avro' using
> org.apache.pig.piggybank.storage.avro.AvroStorage();
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira