[ https://issues.apache.org/jira/browse/CRUNCH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076655#comment-14076655 ]
Micah Whitacre commented on CRUNCH-449: --------------------------------------- * Probably want to provide access to SeqDoFn to have access to a Configuration object for the pipeline/target in the execute method. In the case you give where someone wants to bulk load to HBase an HFile Target Configuration for accessing the FileSystem would be useful. * +1 to Javadoc. Specifically the relationship between when getOutput/execute are called and any guaranteed execution order or not. Also around thread safety/concurrent execution guarantees as well as blocking operations. * Is calling it a DoFn really appropriate? Currently in Crunch a DoFn operates on each element of a PCollection. This instead essentially fork/joins pipeline stages. I don't have a better name unfortunately. * Should SeqDoFn expose access to the collection of labels for targets and PCollection vs just asking for them by name. > Add sequentialDo function for injecting arbitrary non-parallel code > ------------------------------------------------------------------- > > Key: CRUNCH-449 > URL: https://issues.apache.org/jira/browse/CRUNCH-449 > Project: Crunch > Issue Type: Bug > Components: Core > Reporter: Josh Wills > Assignee: Josh Wills > Attachments: CRUNCH-449.patch, CRUNCH-449b.patch > > > I've been noodling on this one for awhile: how to add the ability to execute > some code if and only if one or more targets are created, and have that > executed code (optionally) return one or more new PCollections as a result. I > was thinking that this functionality could be wired in to libraries to do > things like bulk loading HBase tables or running Sqoop jobs as part of Crunch > pipelines automatically. -- This message was sent by Atlassian JIRA (v6.2#6252)