For the gory details you're going to need to explore SSTableReader and/or SSTableWriter.
On Sat, Mar 23, 2013 at 7:01 PM, Amit Kumar <kumarami...@gmail.com> wrote: > We don't want to setup a parallel workflow for analytics, for which > we use hadoop and it will be trivial to copy the new sstables that get > created to the hdfs periodically and then have mappers read the > sstable in parallel. Going through Thrift is an option -but an > inefficient one and one that impacts production Cassandra. > > Amit > > > > On Sat, Mar 23, 2013 at 2:40 PM, Michael Kjellman > <mkjell...@barracuda.com> wrote: >> Just curious, why would you want to store sstables in HDFS? >> >> On 3/23/13 12:43 PM, "Amit Kumar" <kumarami...@gmail.com> wrote: >> >>>I am starting some work on an input-format that would let us read >>>sstables stored in HDFS, I wonder if anyone has worked on something >>>similar before. I did come across >>> >>>http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.ht >>>ml >>> >>>However it's not open sourced/available yet. >>> >>>I am writing for a sanity check before I go too deep into this. >>> >>>I have a few questions -hoping someone here would be able to help. >>> >>>So far, I have been able to read sstables stored on the local file >>>system using the SSTableScanner and the SSTableReader. I am wondering >>>what would be a good way to proceed -having a custom implementation of >>>RandomAccessFile like the (RandomAccessReader and the >>>CompressedRandomAccessReader), that would use hadoop's File System >>>API? >>> >>> >>>I did search for, but could have missed -Is there some documentation >>>on the binary format of the data, index, and stats files? That might >>>make it simpler for me to prototype without having to go through the >>>Cassandra Internals. I am currently working of our production >>>deployment that is 1.1.0. >>> >>>Any guidance if you want to give (I am new to Cassandra Internals). >>> >>>Many thanks >>>Amit >> >> >> Copy, by Barracuda, helps you store, protect, and share all your amazing >> >> things. Start today: www.copy.com. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced