[ https://issues.apache.org/jira/browse/CASSANDRA-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chander S Pechetty updated CASSANDRA-6793: ------------------------------------------ Reproduced In: 2.1 beta1, 2.0.0 (was: 2.0.0, 2.1 beta1) Attachment: trunk-6793-v2.txt patch v2 addresses the following: * Simplify the schema for both input and output tables. Traditionally word count example uses a line as input, so input can just be changed to {noformat} ( id uuid, line text, PRIMARY KEY (id)) {noformat} . Removed category, sub category, and title from input table and using UUID instead and a line to represent a line of text. Changed output schema to {noformat}(word text primary key, count int) {noformat} as suggested earlier. * Remove toString method and printing as it adds to unnecessary clutter in the mapper. * Remove the filter clauses as its not relevant to the word count example However I still see thrift interfaces in WordCountSetup class and runtime dependencies on cassandra in the bin folder. I didn't go into details, but can someone shed some light on the importance of having this. I think having clearly defined client API's/dependencies will be useful for the end user. > NPE in Hadoop Word count example > -------------------------------- > > Key: CASSANDRA-6793 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6793 > Project: Cassandra > Issue Type: Bug > Components: Examples > Reporter: Chander S Pechetty > Assignee: Chander S Pechetty > Priority: Minor > Labels: hadoop > Attachments: trunk-6793-v2.txt, trunk-6793.txt > > > The partition keys requested in WordCount.java do not match the primary key > set up in the table output_words. It looks this patch was not merged properly > from > [CASSANDRA-5622|https://issues.apache.org/jira/browse/CASSANDRA-5622].The > attached patch addresses the NPE and uses the correct keys defined in #5622. > I am assuming there is no need to fix the actual NPE like throwing an > InvalidRequestException back to user to fix the partition keys, as it would > be trivial to get the same from the TableMetadata using the driver API. > java.lang.NullPointerException > at > org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:92) > at > org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:40) > at org.apache.cassandra.client.RingCache.getRange(RingCache.java:117) > at > org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:163) > at > org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:63) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at WordCount$ReducerToCassandra.reduce(Unknown Source) > at WordCount$ReducerToCassandra.reduce(Unknown Source) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) -- This message was sent by Atlassian JIRA (v6.2#6252)