Re: Cassandra + Hadoop + BMT

Jun Rao Wed, 09 Sep 2009 08:07:09 -0700

Thanks, Johan.

I think you can simplify you code by using org.apache.cassandra.client
.RingCache (see test/unit/org.apache.cassandra.client.TestRingCache for
example).


Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

[email protected]



|------------>
| From:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Johan Oskarsson <[email protected]>                                         
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[email protected]                                           
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |09/09/2009 02:49 AM                                                          
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: Cassandra + Hadoop + BMT                                                 
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|






In my version of the code the storage endpoints are pulled out from a
seed node using the NodeProbe class and then put into the StorageService
using the updateTokenMetadata method.

See updateTokenMetadata in CassandraClient:
http://github.com/johanoskarsson/cassandraoutputformat/blob/dfa4dbf9b1bc81854b492af14536693002e19e52/src/java/fm/last/hadoop/mapred/CassandraClient.java


Granted it's not a perfect solution.

/Johan

Jun Rao wrote:
> I was trying to understand how the MapReduce job figures out where a row
> is located in a cassandra cluster and I saw the following code. Does
> this really work? To compute the proper endpoints, the StorageService
> needs to be started to obtain all tokens from other nodes through
> gossip. However, StorageService is not started in the MapReduce job.
>
>     for (EndPoint endpoint :
> StorageService.instance().getReadStorageEndPoints(rowKey)) {
>       /* Send message to end point */
>       MessagingService.getMessagingInstance().sendOneWay(message,
endpoint);
>     }
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> [email protected]
>
>
> Inactive hide details for Johan Oskarsson ---09/01/2009 12:49:28 PM---I
> have slapped together a basic Hadoop 0.18 CassandraOutpJohan Oskarsson
> ---09/01/2009 12:49:28 PM---I have slapped together a basic Hadoop 0.18
> CassandraOutputFormat based on the code Chris put up.
>
>
> From:
> Johan Oskarsson <[email protected]>
>
> To:
> [email protected]
>
> Cc:
> [email protected]
>
> Date:
> 09/01/2009 12:49 PM
>
> Subject:
> Re: Cassandra + Hadoop + BMT
>
> ------------------------------------------------------------------------
>
>
>
>
> I have slapped together a basic Hadoop 0.18 CassandraOutputFormat based
> on the code Chris put up.
>
> Usage:
> conf.setOutputKeyClass(RowColumn.class);
> conf.setOutputValueClass(BytesWritable.class);
>
> conf.setOutputFormat(CassandraOutputFormat.class);
> conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME,
"columnfamilyname");
> conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename");
>
> DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf);
>
> + your job specific settings.
>
> Then after the job run this method: CassandraOutputFormat.forceFlush
>
> Source code here:
> http://github.com/johanoskarsson/cassandraoutputformat/tree/master
>
> Big thanks to Chris for figuring out the mystery that is BinaryMemtable
>
> /Johan
>
> Chris Goffinet wrote:
>> Hi Guys
>>
>> This is long overdue but I have posted a very rough rough example (with
>> Digg stuff removed) for getting BMT working with Cassandra. Patches are
>> coming next up for the JIRA tickets. I'll try to get a more generic
>> map/reduce job finished by end of the week that integrates Hive output.
>>
>> http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master
>>
>> -Chris
>
>
>

Re: Cassandra + Hadoop + BMT

Reply via email to