Matthew Hayes created CRUNCH-166:
------------------------------------
Summary: NullPointerException when attempting to use Sort.sortPairs
Key: CRUNCH-166
URL: https://issues.apache.org/jira/browse/CRUNCH-166
Project: Crunch
Issue Type: Bug
Components: Core
Affects Versions: 0.5.0
Environment: Hadoop 1.0.4
Reporter: Matthew Hayes
Assignee: Josh Wills
I'm attempting to count some strings and then order by the count descending.
My code effectively looks like this:
{code}
PCollection<SomeType> records = pipeline.read(...);
PCollection<String> stringsToCount = records.parallelDo(
new DoFn<SomeType, String>() {
@Override
public void process(SomeType input,Emitter<String> emitter) {
if (input.getRecords() != null && input.getRecords().size() >
0)
{
for (MyRecord record : input.getRecords())
{
emitter.emit(record.getValue().toString());
}
}
}
},
Writables.strings()
);
PTable<String, Long> stats = Aggregate.count(stringsToCount);
PCollection<Pair<String, Long>> sortedStats = Sort.sortPairs(stats, new
ColumnOrder(2, Order.DESCENDING));
pipeline.writeTextFile(sortedStats,"somewhere");
{code}
The error I get is:
{code}
java.lang.NullPointerException
at
org.apache.crunch.lib.Sort$TupleWritableComparator.setConf(Sort.java:459)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:773)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:959)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{code}
Note that the line numbers are shifted because I added some debugging and
recompiled. The NullPointerException is thrown in
TupleWritableComparator.setConf() here:
{code}
String[] columnOrderNames = ordering.split(",");
{code}
I suppose "crunch.ordering" is not set, and therefore ordering is null. When I
check the conf in job tracker I also don't see this property set.
Am I doing something wrong?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira