[
https://issues.apache.org/jira/browse/PIG-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137355#comment-13137355
]
Daniel Dai commented on PIG-2332:
---------------------------------
bq. Giving it a brief read... I am not sure how useful this is if it can't read
generic JSON, but only that stored by JsonStorage. I think the far more common
use case is reading data not generated by Pig. You could at least provide an
optional constructor that takes a pig schema as an argument and parses it to
create the ResourceSchema object; that would make it far more useful (btw, we
should have a way of communicating the "load as .." clause to the loader that
isn't a "maybe, if you implement projection pushdown and we happen to need to
push a projection"). Auto-discovery is nice, but some form of communicating the
expected schema is a must for anything called JsonLoader that's going into the
builtin package, IMO.
Sure. Actually Alan and I discussed about adding auto-discovery, but we decide
put that in the future work. Passing a schema to LoadFunc in the case schema
file is missing is a good idea. I will add it.
bq. You keep a protected ResourceFieldSchema[] – why not ResourceSchema itself?
I don't have an opinion on that. Both sounds equal to me. Why do you like
ResourceSchema?
bq. A new parser is created for every tuple. That seems like it should not be
needed (you have a comment to that effect). Let's fix that.
I couldn't find the right API to do that. Do you have any hint?
bq. Logging of bad records: we should put that into counters instead, and maybe
log once per task, yeah? Log spam is a job killer.
Good suggestion. I will add to counter instead.
bq. Magic strings ("pig.jsonstorage.schema" and the like) should be public
final static String.
Will do
bq. We shouldn't copy+paste javadocs from the interface into the implementation
– javadoc will reproduce the inherited docs if specific ones aren't provided;
the copy+paste approach doesn't give us anything, but does make it so that if
we change the docs down the line, the change won't be reflected here.
Yes, I copied javadocs from Alan's code. The code is to demonstrate how to
write a Load/StoreFunc, and the comments are so nice and I really want to keep
it. I agree put that in javadoc is confusing. I can move that to the regular
comment instead of javadoc.
> JsonLoader/JsonStorage
> ----------------------
>
> Key: PIG-2332
> URL: https://issues.apache.org/jira/browse/PIG-2332
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.10
>
> Attachments: PIG-2332-1.patch
>
>
> A JsonLoader/JsonStorage implementation for Pig. This is based on Alan's
> implementation in the book
> (http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html). I
> made some minor changes:
> 1. Drop the jackson feature requires 1.01+. Since Hadoop 203+ bundles jackson
> 1.01, newer feature fails when running on Hadoop 203+.
> 2. Using Json format for schema. This borrows Dmitry's schema implememtation
> in PigStorage.
> 3. Some bug fixes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira