[ 
https://issues.apache.org/jira/browse/CRUNCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290290#comment-14290290
 ] 

Gabriel Reid commented on CRUNCH-489:
-------------------------------------

Yep, I think having a {{read()}} method that takes a name would be pretty handy 
(as well as being really useful in this context for allowing named created 
collections when running with MRPipeline).

I really like the approach here, there have been quite a few times where I've 
wished something like this existed (or would have wished for it if I'd had the 
idea).

If I'm reading it correctly, the parallelism parameter will only work with 
Text-based Writables (looking at AvroType and WritableType). I'm thinking that 
it might be possible to get around that by just writing multiple files in the 
createSourceTarget method of those classes, and then parallelism would work 
regardless of the underlying type (as long as CombineFileInputFormat doesn't 
get in the way). 

Apart from that, a few really small nits I noticed with the current patch:
* Maybe CreatedCollection should have a different name (or at least a bit of 
javadoc), as it is currently not that easy to know what it does based on that 
name. MemoryBasedCollection? InputIterableCollection? I don't know. Similar 
comment also possibly applies to MapInputFn and MapPairInputFn, or those 
classes could even be static inner classes of CreatedCollection I guess.
* There are a few wildcard imports, which are not compliant with the 
non-existent coding conventions
* NLineInputFn is no longer directly testing the NLineInputSource, which is a 
bit confusing (although it's definitely doing a valid test)
* CreatedCollection currently does some unecessary null checking and default 
value setting on CreatedCollection.getName()



> Add methods to create PCollections from Java Iterable to Pipeline interface
> ---------------------------------------------------------------------------
>
>                 Key: CRUNCH-489
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-489
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-489.patch, CRUNCH-489b.patch, CRUNCH-489c.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to