Re: Cassandra + Hadoop + BMT

Johan Oskarsson Tue, 01 Sep 2009 12:49:04 -0700

I have slapped together a basic Hadoop 0.18 CassandraOutputFormat basedon the code Chris put up.


Usage:
conf.setOutputKeyClass(RowColumn.class);
conf.setOutputValueClass(BytesWritable.class);


conf.setOutputFormat(CassandraOutputFormat.class);
conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME, "columnfamilyname");
conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename");

DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf);

+ your job specific settings.

Then after the job run this method: CassandraOutputFormat.forceFlush

Source code here:
http://github.com/johanoskarsson/cassandraoutputformat/tree/master

Big thanks to Chris for figuring out the mystery that is BinaryMemtable

/Johan

Chris Goffinet wrote:

Hi Guys
This is long overdue but I have posted a very rough rough example (withDigg stuff removed) for getting BMT working with Cassandra. Patches arecoming next up for the JIRA tickets. I'll try to get a more genericmap/reduce job finished by end of the week that integrates Hive output.
http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master

-Chris

Re: Cassandra + Hadoop + BMT

Reply via email to