[jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult

Anuj Ojha (JIRA) Tue, 27 May 2014 14:19:47 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010286#comment-14010286
 ]


Anuj Ojha commented on CRUNCH-400:
----------------------------------

Hello Josh, below is what we are doing:

{code}
some processing.. Map/Reduce jobs

PCollection hbaseData = getDataFromHbase();

PTable hbaseDataTable = hbaseData.by();

PGroupTable hbasePGroupTable = hbaseDataTable.groupByKey();

PCollection hFileData = hBasePGroupTable.parallelDo("Convert this data to 
HFile");

writeHFile(hFileData);

PCollection<String> dataToBeMaterialized =  hbasePGroupTable.parallelDo();

Set<String> materializedData= 
Sets.newHashSet(dataToBeMaterialized.materialized());
{code}

Is this what you are looking for? Or do you need more information regarding 
this?

> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>
>                 Key: CRUNCH-400
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-400
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Micah Whitacre
>
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing 
> list[1], a set of jobs kicked off due to a materialize() call will not be 
> tracked as part of the Pipeline's stage results returned by the 
> PipelineResult.
> [1] - 
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult

Reply via email to