Micah Whitacre created CRUNCH-194:
-------------------------------------
Summary: Utilities and Documentation for handling values in
reduce-Iterable
Key: CRUNCH-194
URL: https://issues.apache.org/jira/browse/CRUNCH-194
Project: Crunch
Issue Type: Bug
Components: Core
Reporter: Micah Whitacre
Assignee: Josh Wills
Clarify documentation and provide utilities for the appropriate use of values
from inside the Iterable inside of DoFn and MapFn?
As an example we've gotten bitten by the case where we were storing off the
individual items inside the Iterable to then do processing once we've read all
the values in.
{code}
@Override
public Foo map(final Pair<Bar, Iterable<Bat>> input) {
List<Bat> bats = ...;
for(Bat b: input){
bats.add(b);
}
return new Foo(bats);
}
{code}
When this gets ran during a reduce, the list bats will end up with a single
item instead of multiple items. For this to work properly we actually have to
make a copy of each item in the iterable. Making the javadoc more clearly state
this behavior would help consumers to write the MapFn/DoFn correctly the first
time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira