[ 
https://issues.apache.org/jira/browse/CASSANDRA-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930645#comment-13930645
 ] 

Jonathan Ellis edited comment on CASSANDRA-6793 at 3/11/14 5:51 PM:
--------------------------------------------------------------------

I confess that I'm mystified by the schema introduced in CASSANDRA-4421:

{noformat}
/**
 * This counts the occurrences of words in ColumnFamily
 *   cql3_worldcount ( user_id text,
 *                   category_id text,
 *                   sub_category_id text,
 *                   title  text,
 *                   body  text,
 *                   PRIMARY KEY (user_id, category_id, sub_category_id))
 *
 * For each word, we output the total number of occurrences across all body 
texts.
 *
 * When outputting to Cassandra, we write the word counts to column family
 *  output_words ( row_id1 text,
 *                 row_id2 text,
 *                 word text,
 *                 count_num text,
 *                 PRIMARY KEY ((row_id1, row_id2), word))
 * as a {word, count} to columns: word, count_num with a row key of "word sum"
 */
{noformat}

Both the input and output tables look far more complex than necessary.  

My preferred solution would be to just strip the output down to {{(word text 
primary key, count int)}}, and make a similar simplification for the input.

Can you shed any light [~alexliu68]?


was (Author: jbellis):
I confess that I'm mystified by the schema introduced in CASSANDRA-4421:

{noformat}
/**
 * This counts the occurrences of words in ColumnFamily
 *   cql3_worldcount ( user_id text,
 *                   category_id text,
 *                   sub_category_id text,
 *                   title  text,
 *                   body  text,
 *                   PRIMARY KEY (user_id, category_id, sub_category_id))
 *
 * For each word, we output the total number of occurrences across all body 
texts.
 *
 * When outputting to Cassandra, we write the word counts to column family
 *  output_words ( row_id1 text,
 *                 row_id2 text,
 *                 word text,
 *                 count_num text,
 *                 PRIMARY KEY ((row_id1, row_id2), word))
 * as a {word, count} to columns: word, count_num with a row key of "word sum"
 */
/**
 * This counts the occurrences of words in ColumnFamily
 *   cql3_worldcount ( user_id text,
 *                   category_id text,
 *                   sub_category_id text,
 *                   title  text,
 *                   body  text,
 *                   PRIMARY KEY (user_id, category_id, sub_category_id))
 *
 * For each word, we output the total number of occurrences across all body 
texts.
 *
 * When outputting to Cassandra, we write the word counts to column family
 *  output_words ( row_id1 text,
 *                 row_id2 text,
 *                 word text,
 *                 count_num text,
 *                 PRIMARY KEY ((row_id1, row_id2), word))
 * as a {word, count} to columns: word, count_num with a row key of "word sum"
 */
{noformat}

Both the input and output tables look far more complex than necessary.  

My preferred solution would be to just strip the output down to {{(word text 
primary key, count int)}}, and make a similar simplification for the input.

Can you shed any light [~alexliu68]?

> NPE in Hadoop Word count example
> --------------------------------
>
>                 Key: CASSANDRA-6793
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6793
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Examples
>            Reporter: Chander S Pechetty
>            Assignee: Chander S Pechetty
>            Priority: Minor
>              Labels: hadoop
>         Attachments: trunk-6793.txt
>
>
> The partition keys requested in WordCount.java do not match the primary key 
> set up in the table output_words. It looks this patch was not merged properly 
> from 
> [CASSANDRA-5622|https://issues.apache.org/jira/browse/CASSANDRA-5622].The 
> attached patch addresses the NPE and uses the correct keys defined in #5622.
> I am assuming there is no need to fix the actual NPE like throwing an 
> InvalidRequestException back to user to fix the partition keys, as it would 
> be trivial to get the same from the TableMetadata using the driver API.
> java.lang.NullPointerException
>       at 
> org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:92)
>       at 
> org.apache.cassandra.dht.Murmur3Partitioner.getToken(Murmur3Partitioner.java:40)
>       at org.apache.cassandra.client.RingCache.getRange(RingCache.java:117)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:163)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter.write(CqlRecordWriter.java:63)
>       at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at WordCount$ReducerToCassandra.reduce(Unknown Source)
>       at WordCount$ReducerToCassandra.reduce(Unknown Source)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to