On Sat, Oct 22, 2011 at 1:17 AM, Dhruba Borthakur <[email protected]> wrote: > One of the current problems with hbase eating lots of cpu is the fact that > the memstore in a sortedset. Instead, we can make it a hashset so that > lookup and insertions are much faster. At the time of flushing, we can sort > the snapshot memstore and write it out to hdfs. This will decrease latencies > of Puts to a great extent. I will experiment on how this will fare with > real-life workload. >
How would you scan in order an HashSet? Deletes across spans (families or all older than a particular timestamp)? On the other hand, I did have a chat w/ the LMAX folks recently. They had made a point earlier in the day that java Collections and Concurrent are well due an overhaul (as an aside in a talk whose general thrust was revisit all assumptions especially atop modern hardware). I was asking what the underpinnings of a modern Collection might look like and in particular described our issue with ConcurrentSkipListMap. One of the boys threw out the notion of catching the inserts in their Disruptor data structure and then sorting the data structure. Seemed a bit of a silly suggestion at the time but perhaps if we added MVCC to the mix and the read point moved on after the completion of a sort. We'd be juggling lots of sorted lists in memory.... > I have also been playing around with a 5 node test cluster that has > flashdrives. The flash drives are mounted as xfs filesystems. > A LookasideCacheFileSystem (http://bit.ly/pnGju0) that is a client side > layered filter driver on top of hdfs. When HBase flushes data to HDFS, it is > cached transparently in the LookasideCacheFileSystem. > The LookasideCacheFileSystem uses the flash drive as a cache. The assumption > here is that recently flushed hfiles are more likely to be accessed than the > data in HFiles that were flushed earlier (not yet messed with major > compactions). I will be measuring the performance benefit of this > configuration. > That sounds sweet Dhruba. St.Ack
