[ 
https://issues.apache.org/jira/browse/PIG-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136753#comment-13136753
 ] 

Dmitriy V. Ryaboy commented on PIG-2332:
----------------------------------------

Users will be so happy to see this!

Giving it a brief read... I am not sure how useful this is if it can't read 
generic JSON, but only that stored by JsonStorage. I think the far more common 
use case is reading data not generated by Pig. You could at least provide an 
optional constructor that takes a pig schema as an argument and parses it to 
create the ResourceSchema object; that would make it far more useful (btw, we 
should have a way of communicating the "load as .." clause to the loader that 
isn't a "maybe, if you implement projection pushdown and we happen to need to 
push a projection"). Auto-discovery is nice, but *some* form of communicating 
the expected schema is a must for anything called JsonLoader that's going into 
the builtin package, IMO.

You keep a protected ResourceFieldSchema[] -- why not ResourceSchema itself?

A new parser is created for every tuple. That seems like it should not be 
needed (you have a comment to that effect). Let's fix that.

Logging of bad records: we should put that into counters instead, and maybe log 
once per task, yeah? Log spam is a job killer.

Magic strings ("pig.jsonstorage.schema" and the like) should be public final 
static String.

We shouldn't copy+paste javadocs from the interface into the implementation -- 
javadoc will reproduce the inherited docs if specific ones aren't provided; the 
copy+paste approach doesn't give us anything, but does make it so that if we 
change the docs down the line, the change won't be reflected here.

                
> JsonLoader/JsonStorage
> ----------------------
>
>                 Key: PIG-2332
>                 URL: https://issues.apache.org/jira/browse/PIG-2332
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2332-1.patch
>
>
> A JsonLoader/JsonStorage implementation for Pig. This is based on Alan's 
> implementation in the book 
> (http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html). I 
> made some minor changes:
> 1. Drop the jackson feature requires 1.01+. Since Hadoop 203+ bundles jackson 
> 1.01, newer feature fails when running on Hadoop 203+.
> 2. Using Json format for schema. This borrows Dmitry's schema implememtation 
> in PigStorage.
> 3. Some bug fixes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to