[ https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010343#comment-14010343 ]
Josh Wills commented on CRUNCH-400: ----------------------------------- Okay, I understand now. Calling: Sets.newHashSet(...) is going to immediately materialize the results. And there's no way to get the StageResults associated with doing that. The client-side fix would be to alter the code to run like this: Iterable<String> preMaterialized = dataToBeMaterialized.materialize(); PipelineResult res = pipeline.run(); Set<String> materializedData = Sets.newHashSet(preMaterialized); I could also make the MaterializableIterable that gets returned by the materialize() call aware of its StageResults (I think). > Materialized jobs should have stage in PipelineResult > ----------------------------------------------------- > > Key: CRUNCH-400 > URL: https://issues.apache.org/jira/browse/CRUNCH-400 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.9.0, 0.8.2 > Reporter: Micah Whitacre > > Brought up as part of the proposed fix for CRUNCH-272 and on the mailing > list[1], a set of jobs kicked off due to a materialize() call will not be > tracked as part of the Pipeline's stage results returned by the > PipelineResult. > [1] - > http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.2#6252)