[ 
https://issues.apache.org/jira/browse/PIG-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137355#comment-13137355
 ] 

Daniel Dai commented on PIG-2332:
---------------------------------

bq. Giving it a brief read... I am not sure how useful this is if it can't read 
generic JSON, but only that stored by JsonStorage. I think the far more common 
use case is reading data not generated by Pig. You could at least provide an 
optional constructor that takes a pig schema as an argument and parses it to 
create the ResourceSchema object; that would make it far more useful (btw, we 
should have a way of communicating the "load as .." clause to the loader that 
isn't a "maybe, if you implement projection pushdown and we happen to need to 
push a projection"). Auto-discovery is nice, but some form of communicating the 
expected schema is a must for anything called JsonLoader that's going into the 
builtin package, IMO.
Sure. Actually Alan and I discussed about adding auto-discovery, but we decide 
put that in the future work. Passing a schema to LoadFunc in the case schema 
file is missing is a good idea. I will add it.

bq. You keep a protected ResourceFieldSchema[] – why not ResourceSchema itself?
I don't have an opinion on that. Both sounds equal to me. Why do you like 
ResourceSchema?

bq. A new parser is created for every tuple. That seems like it should not be 
needed (you have a comment to that effect). Let's fix that.
I couldn't find the right API to do that. Do you have any hint?

bq. Logging of bad records: we should put that into counters instead, and maybe 
log once per task, yeah? Log spam is a job killer.
Good suggestion. I will add to counter instead. 

bq. Magic strings ("pig.jsonstorage.schema" and the like) should be public 
final static String.
Will do

bq. We shouldn't copy+paste javadocs from the interface into the implementation 
– javadoc will reproduce the inherited docs if specific ones aren't provided; 
the copy+paste approach doesn't give us anything, but does make it so that if 
we change the docs down the line, the change won't be reflected here.
Yes, I copied javadocs from Alan's code. The code is to demonstrate how to 
write a Load/StoreFunc, and the comments are so nice and I really want to keep 
it. I agree put that in javadoc is confusing. I can move that to the regular 
comment instead of javadoc.

                
> JsonLoader/JsonStorage
> ----------------------
>
>                 Key: PIG-2332
>                 URL: https://issues.apache.org/jira/browse/PIG-2332
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.10
>
>         Attachments: PIG-2332-1.patch
>
>
> A JsonLoader/JsonStorage implementation for Pig. This is based on Alan's 
> implementation in the book 
> (http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html). I 
> made some minor changes:
> 1. Drop the jackson feature requires 1.01+. Since Hadoop 203+ bundles jackson 
> 1.01, newer feature fails when running on Hadoop 203+.
> 2. Using Json format for schema. This borrows Dmitry's schema implememtation 
> in PigStorage.
> 3. Some bug fixes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to