[ 
https://issues.apache.org/jira/browse/CASSANDRA-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chander S Pechetty updated CASSANDRA-6793:
------------------------------------------

    Reproduced In: 2.1 beta1, 2.0.0  (was: 2.0.0, 2.1 beta1)
       Attachment: trunk-6793-v2.txt

 patch v2 addresses the following:
 
    * Simplify the schema for both input and output tables. Traditionally word 
count example uses a line as input, so input can just be changed to {noformat} 
( id uuid, line text, PRIMARY KEY (id)) {noformat} .  Removed category, sub 
category, and title from input table and using UUID instead and a line to 
represent a line of text. Changed output schema to {noformat}(word text primary 
key, count int) {noformat} as suggested earlier.
    * Remove toString method and printing as it adds to unnecessary clutter in 
the mapper.
    * Remove the filter clauses as its not relevant to the word count example

However I still see thrift interfaces in WordCountSetup class and runtime 
dependencies on cassandra in the bin folder. I didn't go into details, but can 
someone shed some light on the importance of having this. I  think having 
clearly defined client API's/dependencies will be useful for the end user.

> NPE in Hadoop Word count example
> --------------------------------
>
>                 Key: CASSANDRA-6793
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6793
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Examples
>            Reporter: Chander S Pechetty
>            Assignee: Chander S Pechetty
>            Priority: Minor
>              Labels: hadoop
>         Attachments: trunk-6793-v2.txt, trunk-6793.txt
>
>
> The partition keys requested in WordCount.java do not match the primary key 
> set up in the table output_words. It looks this patch was not merged properly 
> from 
> [CASSANDRA-5622|https://issues.apache.org/jira/browse/CASSANDRA-5622].The 
> attached patch addresses the NPE and uses the correct keys defined in #5622.
> I am assuming there is no need to fix the actual NPE like throwing an 
> InvalidRequestException back to user to fix the partition keys, as it would 
> be trivial to get the same from the TableMetadata using the driver API.
> java.lang.NullPointerException
>       at 
> org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:92)
>       at 
> org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:40)
>       at org.apache.cassandra.client.RingCache.getRange(RingCache.java:117)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:163)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:63)
>       at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at WordCount$ReducerToCassandra.reduce(Unknown Source)
>       at WordCount$ReducerToCassandra.reduce(Unknown Source)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to