Nick Dimiduk created HBASE-15352:
------------------------------------
Summary: FST BlockEncoder
Key: HBASE-15352
URL: https://issues.apache.org/jira/browse/HBASE-15352
Project: HBase
Issue Type: New Feature
Components: regionserver
Reporter: Nick Dimiduk
Fix For: 2.0.0, 1.4.0
We could improve on the existing [PREFIX_TREE
block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
encoder by upgrading the persistent data structure from a trie to a finite
state transducer. This would theoretically allow us to reuse bytes not just for
rowkey prefixes, but infixes and suffixes as well. My read of the literature
means we may also be able to encode values as well, further reducing storage
size when values are repeated (ie, a "customer id" field with very low
cardinality -- probably happens a lot in our denormalized world). There's a
really nice [blog post|http://blog.burntsushi.net/transducers/] about this data
structure, and apparently our siblings in Lucene make heavy use of [their
implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)