[
https://issues.apache.org/jira/browse/HBASE-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Dimiduk updated HBASE-15352:
---------------------------------
Fix Version/s: (was: 1.4.0)
(was: 0.98.19)
(was: 2.0.0)
No one is assigned. Unscheduling.
> FST BlockEncoder
> ----------------
>
> Key: HBASE-15352
> URL: https://issues.apache.org/jira/browse/HBASE-15352
> Project: HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: Nick Dimiduk
>
> We could improve on the existing [PREFIX_TREE
> block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
> encoder by upgrading the persistent data structure from a trie to a finite
> state transducer. This would theoretically allow us to reuse bytes not just
> for rowkey prefixes, but infixes and suffixes as well. My read of the
> literature means we may also be able to encode values as well, further
> reducing storage size when values are repeated (ie, a "customer id" field
> with very low cardinality -- probably happens a lot in our denormalized
> world). There's a really nice [blog
> post|http://blog.burntsushi.net/transducers/] about this data structure, and
> apparently our siblings in Lucene make heavy use of [their
> implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)