On Mon, Sep 10, 2012 at 12:03 AM, Jacques <whs...@gmail.com> wrote: > - How important is indexing column qualifiers themselves (similar to > Cassandra where people frequently utilize column qualifiers as "values" > with no actual values stored)?
It would be good to have a secondary indexing option that can build an index from some transform of family+qualifier. > - In general it seems like there is tension between the main low level > approaches of (1) leverage as much HBase infrastructure as possible (e.g. > secondary tables) and (2) leverage an efficient indexing library e.g. > Lucene. Regarding option #2, Jason Rutherglen's experiences may be of interest: https://issues.apache.org/jira/browse/HBASE-3529 . The new Codec and CodecProvider classes of Lucene 4 could conceivably support storage of postings in HBase proper now (http://wiki.apache.org/lucene-java/FlexibleIndexing) so HDFS hacks for bringing indexes local for mmapping may not be necessary, though this is a huge hand-wave. The remainder of your mail is focused on option #1, I have no comment to add there, lots of food for thought. > * > * > *Approach Thoughts* > Trying to leverage HBase as much as possible is hard if we want to utilize > the approach above and have consistent indexing. However, I think we can > do it if we add support for what I will call a "local shadow family". > These are additional, internally managed families for a table. However, > they have the special characteristic that they belong to the region despite > their primary keys being outside the range of the region's. Otherwise they > look like a typical family. On splits, they are regenerated (somehow). If > we take advantage of Lars' > HBASE-5229<https://issues.apache.org/jira/browse/HBASE-5229>, > we then have the opportunity to consistently insert one or more rows into > these local shadow families for the purpose of secondary indexing. The > structure of these secondary families could use row keys as the indexed > values, qualifiers for specific store files and the value of each being a > list of originating keys (using read-append or > HBASE-5993<https://issues.apache.org/jira/browse/HBASE-5993>). > By leveraging the existing family infrastructure, we get things like > optional in-memory indexes and basic scanners for free and don't have to > swallow a big chunk of external indexing code. > > The simplest approach for integration of these for queries would be > internally be a ScannerBasedFilter (a filter that is based on a scanner) > and a GroupingScanner (a Scanner that does intersection and/or union of > scanners for multi criteria queries). Implementation of these scanners > could happen at one of two levels: > > - StoreScanner level: A more efficient approach using the store file > qualifier approach above (this allows easier maintenance of index > deletions) > - RegionScanner level: A simpler implementation with less violation of > existing encapsulation. We'd store row keys in qualifiers instead of > values to ensure ordering that works iteratively with RegionScanner. The > weaknesses of this approach are less efficient scanning and figuring out > how to manage primary value deletes. > > In general, the best way to deal with deletes is probably to age them out > per storefile and just filter "near misses" as a secondary filter that > works with ScannerBasedFilter. The client side would be TBD but would > probably offer some kind of criteria filters that on server side had all > the lower level ramifications. > > *Future Optimizations* > In a perfect world, we'd actually use StoreFile block start locations as > the index pointer values in the secondary families. This would make things > much more compact and efficient. Especially if we used a smarter block > codec that took advantage of this nature. However, this requires quite a > bit more work since we'd need to actually use the primary keys in the > secondary memstore and then "patch" the values to block locations as we > flushed the primary family that we were indexing (ugh). > > Assuming that the primary limiter of peak write throughput for HBase is > typically WAL writing and since indexes have no "real" data, we could > consider disabling WAL for local shadow families and simply regenerate this > data upon primary WAL playback. I haven't spent enough time in that code > to know what kind of consistency pain this would cause (my intuition is it > would be fine as long as we didn't fix > HBASE-3149<https://issues.apache.org/jira/browse/HBASE-3149>). > If consistency isn't a problem, this would be a nice option since it means > that indexing would have minimal impact on peak write throughput. > > *I haven't thought at all about...* > > - How/whether this makes sense to be implemented as a coprocessor. > - Weird timestamp impacts/considerations here. > - Version handling/impacts. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)