[ https://issues.apache.org/jira/browse/CRUNCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290290#comment-14290290 ]
Gabriel Reid commented on CRUNCH-489: ------------------------------------- Yep, I think having a {{read()}} method that takes a name would be pretty handy (as well as being really useful in this context for allowing named created collections when running with MRPipeline). I really like the approach here, there have been quite a few times where I've wished something like this existed (or would have wished for it if I'd had the idea). If I'm reading it correctly, the parallelism parameter will only work with Text-based Writables (looking at AvroType and WritableType). I'm thinking that it might be possible to get around that by just writing multiple files in the createSourceTarget method of those classes, and then parallelism would work regardless of the underlying type (as long as CombineFileInputFormat doesn't get in the way). Apart from that, a few really small nits I noticed with the current patch: * Maybe CreatedCollection should have a different name (or at least a bit of javadoc), as it is currently not that easy to know what it does based on that name. MemoryBasedCollection? InputIterableCollection? I don't know. Similar comment also possibly applies to MapInputFn and MapPairInputFn, or those classes could even be static inner classes of CreatedCollection I guess. * There are a few wildcard imports, which are not compliant with the non-existent coding conventions * NLineInputFn is no longer directly testing the NLineInputSource, which is a bit confusing (although it's definitely doing a valid test) * CreatedCollection currently does some unecessary null checking and default value setting on CreatedCollection.getName() > Add methods to create PCollections from Java Iterable to Pipeline interface > --------------------------------------------------------------------------- > > Key: CRUNCH-489 > URL: https://issues.apache.org/jira/browse/CRUNCH-489 > Project: Crunch > Issue Type: Bug > Reporter: Josh Wills > Attachments: CRUNCH-489.patch, CRUNCH-489b.patch, CRUNCH-489c.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)