[
https://issues.apache.org/jira/browse/HBASE-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-15352:
-----------------------------------
Fix Version/s: 0.98.19
> FST BlockEncoder
> ----------------
>
> Key: HBASE-15352
> URL: https://issues.apache.org/jira/browse/HBASE-15352
> Project: HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: Nick Dimiduk
> Fix For: 2.0.0, 0.98.19, 1.4.0
>
>
> We could improve on the existing [PREFIX_TREE
> block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
> encoder by upgrading the persistent data structure from a trie to a finite
> state transducer. This would theoretically allow us to reuse bytes not just
> for rowkey prefixes, but infixes and suffixes as well. My read of the
> literature means we may also be able to encode values as well, further
> reducing storage size when values are repeated (ie, a "customer id" field
> with very low cardinality -- probably happens a lot in our denormalized
> world). There's a really nice [blog
> post|http://blog.burntsushi.net/transducers/] about this data structure, and
> apparently our siblings in Lucene make heavy use of [their
> implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)