[
https://issues.apache.org/jira/browse/CRUNCH-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated CRUNCH-70:
------------------------------------------
Attachment: CRUNCH-70-20120919.txt
Here's a patch to do this.
I added a util called Pipelines following the convention.
One question though (left as a TODO in the patch): In writeTextFile() run as
part of a MR pipeline, we do the following:
{code}
+ collection =
+ collection.parallelDo("asText", IdentityFn.<T> getInstance(),
+ WritableTypeFamily.getInstance().as(collection.getPType()));
{code}
Why do we do it? And do we really need MRPipeline to force the PTypeFamily to
be Writables?
> Simplify Pipeline API
> ---------------------
>
> Key: CRUNCH-70
> URL: https://issues.apache.org/jira/browse/CRUNCH-70
> Project: Crunch
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Attachments: CRUNCH-70-20120919.txt
>
>
> Today Pipeline interface has the following APIs which really belong to a
> utils class:
> - readTextFile
> - writeTextFile
> - enableDebug
> The implementation of these APIs is the same in both the Pipeline-types
> present today and are most likely going to be the same if ever we have one
> more impl.
> I propose we move these to a util/lib to make the core interface cleaner.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira