[
https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010286#comment-14010286
]
Anuj Ojha commented on CRUNCH-400:
----------------------------------
Hello Josh, below is what we are doing:
{code}
some processing.. Map/Reduce jobs
PCollection hbaseData = getDataFromHbase();
PTable hbaseDataTable = hbaseData.by();
PGroupTable hbasePGroupTable = hbaseDataTable.groupByKey();
PCollection hFileData = hBasePGroupTable.parallelDo("Convert this data to
HFile");
writeHFile(hFileData);
PCollection<String> dataToBeMaterialized = hbasePGroupTable.parallelDo();
Set<String> materializedData=
Sets.newHashSet(dataToBeMaterialized.materialized());
{code}
Is this what you are looking for? Or do you need more information regarding
this?
> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>
> Key: CRUNCH-400
> URL: https://issues.apache.org/jira/browse/CRUNCH-400
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Micah Whitacre
>
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing
> list[1], a set of jobs kicked off due to a materialize() call will not be
> tracked as part of the Pipeline's stage results returned by the
> PipelineResult.
> [1] -
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.2#6252)