[
https://issues.apache.org/jira/browse/PIG-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136753#comment-13136753
]
Dmitriy V. Ryaboy commented on PIG-2332:
----------------------------------------
Users will be so happy to see this!
Giving it a brief read... I am not sure how useful this is if it can't read
generic JSON, but only that stored by JsonStorage. I think the far more common
use case is reading data not generated by Pig. You could at least provide an
optional constructor that takes a pig schema as an argument and parses it to
create the ResourceSchema object; that would make it far more useful (btw, we
should have a way of communicating the "load as .." clause to the loader that
isn't a "maybe, if you implement projection pushdown and we happen to need to
push a projection"). Auto-discovery is nice, but *some* form of communicating
the expected schema is a must for anything called JsonLoader that's going into
the builtin package, IMO.
You keep a protected ResourceFieldSchema[] -- why not ResourceSchema itself?
A new parser is created for every tuple. That seems like it should not be
needed (you have a comment to that effect). Let's fix that.
Logging of bad records: we should put that into counters instead, and maybe log
once per task, yeah? Log spam is a job killer.
Magic strings ("pig.jsonstorage.schema" and the like) should be public final
static String.
We shouldn't copy+paste javadocs from the interface into the implementation --
javadoc will reproduce the inherited docs if specific ones aren't provided; the
copy+paste approach doesn't give us anything, but does make it so that if we
change the docs down the line, the change won't be reflected here.
> JsonLoader/JsonStorage
> ----------------------
>
> Key: PIG-2332
> URL: https://issues.apache.org/jira/browse/PIG-2332
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.10
>
> Attachments: PIG-2332-1.patch
>
>
> A JsonLoader/JsonStorage implementation for Pig. This is based on Alan's
> implementation in the book
> (http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html). I
> made some minor changes:
> 1. Drop the jackson feature requires 1.01+. Since Hadoop 203+ bundles jackson
> 1.01, newer feature fails when running on Hadoop 203+.
> 2. Using Json format for schema. This borrows Dmitry's schema implememtation
> in PigStorage.
> 3. Some bug fixes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira