Wikipedia Example has incorrect input Key
-----------------------------------------
Key: MAHOUT-91
URL: https://issues.apache.org/jira/browse/MAHOUT-91
Project: Mahout
Issue Type: Bug
Components: Classification
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 0.1
Running the WikipediaDataSetCreator
{code}
bin/hadoop jar ~/projects/lucene/mahout/mahout-clean/examples/build/
org.apache.mahout.examples.classifiers.cbayes.WikipediaDatasetCreator -i
wikipediadump -o wikipediainput -c
~/projects/lucene/mahout/mahout-clean/examples/src/test/resources/country.txt
{code}
yielded:
08/10/31 11:15:26 INFO mapred.JobClient: Task Id :
attempt_200810301619_0001_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast
to org.apache.hadoop.io.Text
at
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorMapper.map(WikipediaDatasetCreatorMapper.java:41)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
The fix is:
{code}
Index:
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
===================================================================
---
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
(revision 709230)
+++
src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorMapper.java
(working copy)
@@ -20,6 +20,7 @@
import org.apache.commons.lang.StringEscapeUtils;
import org.apache.hadoop.io.DefaultStringifier;
import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
@@ -39,11 +40,11 @@
import java.util.Set;
public class WikipediaDatasetCreatorMapper extends MapReduceBase implements
- Mapper<Text, Text, Text, Text> {
+ Mapper<LongWritable, Text, Text, Text> {
private static Set<String> countries = null;
- public void map(Text key, Text value,
+ public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String document = value.toString();
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.