[ https://issues.apache.org/jira/browse/CRUNCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290949#comment-14290949 ]
Gabriel Reid commented on CRUNCH-489: ------------------------------------- Looks good. About the partitioning into multiple files, I'm definitely in favor of opening up P connections to HDFS, as it'll work with any kind of Iterable (I'm thinking of some kind of generator-style iterable that's generating too much stuff to fit in memory), as well as not being restricted to working with Collections. I guess the only real drawback there is perhaps some kind of issue with a max number of concurrently open files to HDFS if P is really large, but I'm guessing that that would have to be a *really* large value of P (I'm not sure if there is some kind of limit somewhere). > Add methods to create PCollections from Java Iterable to Pipeline interface > --------------------------------------------------------------------------- > > Key: CRUNCH-489 > URL: https://issues.apache.org/jira/browse/CRUNCH-489 > Project: Crunch > Issue Type: Bug > Reporter: Josh Wills > Attachments: CRUNCH-489.patch, CRUNCH-489b.patch, CRUNCH-489c.patch, > CRUNCH-489d.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)