[
https://issues.apache.org/jira/browse/PIG-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849599#comment-13849599
]
Cheolsoo Park commented on PIG-3627:
------------------------------------
The problem is 'NULL' in your schema. 'NULL' is not recognized by the schema
parser.
Explicitly defining the type of {{product}} in this line probably will fix your
error-
{code}
flatten(dsym.product) as product:chararray,
{code}
However, I think the better way is to define the type of every field in your
query from the beginning so that you won't end up with NULL type after
flattening {{dsym.product}} in the first place. Pig is type-sensitive, so I
strongly recommend to define type for every field if possible.
To answer your questions, I agree that JsonStorage is not robust. For this
case, it passes around the schema info as string and parse it using
Utils.parseSchema() function. This is not robust at all.
Ideally, what JsonStorage should do is to cast any field with no type info to
bytearray. Contribution is welcome. :-)
> Json storage : Doesn't work in cases , where other Store Functions (like
> PigStorage / AvroStorage) do work.
> ------------------------------------------------------------------------------------------------------------
>
> Key: PIG-3627
> URL: https://issues.apache.org/jira/browse/PIG-3627
> Project: Pig
> Issue Type: Bug
> Reporter: jay vyas
>
> The following query
> {code:title=Bar.java|borderStyle=solid}
> pigServer.registerQuery(
> "uniqcnt = foreach transactionsG {"+
> "sym = transactions.product ;"+
> "dsym = distinct sym ;"+
> "generate flatten(dsym.product) as product,
> COUNT(dsym) as count ;" +
> "};");
> {code}
> Results in the schema:
> {code}
> Schema : {product: NULL,count: long}
> {code}
> This schema, is storable using AvroStorage or PigStorage, but it fails if
> stored using JsonStorage:
> {code}
> Failed to parse: <line 1, column 8> Syntax error, unexpected symbol at or
> near ','
> at
> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:94)
> at
> org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:108)
> at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:208)
> at org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:182)
> at
> org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:140)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> {code}
> It appears that JsonStorage is thus less robust than the other storage
> formats. Can we confirm or deny if some types of data structures do/ do not
> work with JsonStorage?
> So,I suggest:
> 1) Ideally, I would think JsonStorage should support the same data that other
> Storage functions support.
> the next best thing:
> 2) Maybe a wiki page of examples that can / cannot work with JsonStorage
> and/or a better error message would be sufficient to solve this "bug".
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)