I am starting some work on an input-format that would let us read
sstables stored in HDFS, I wonder if anyone has worked on something
similar before. I did come across

http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html

However it's not open sourced/available yet.

I am writing for a sanity check before I go too deep into this.

I have a few questions -hoping someone here would be able to help.

So far, I have been able to read sstables stored on the local file
system using the SSTableScanner and the SSTableReader. I am wondering
what would be a good way to proceed -having a custom implementation of
RandomAccessFile like the (RandomAccessReader and the
CompressedRandomAccessReader), that would use hadoop's  File System
API?


I did search for, but could have missed -Is there some documentation
on the binary format of the data, index, and stats files? That might
make it simpler for me to prototype without having to go through the
Cassandra Internals. I am currently working of our production
deployment that is 1.1.0.

Any guidance if you want to give (I am new to Cassandra Internals).

Many thanks
Amit

Reply via email to