[ 
https://issues.apache.org/jira/browse/HBASE-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-15352:
-----------------------------------
    Fix Version/s: 0.98.19

> FST BlockEncoder
> ----------------
>
>                 Key: HBASE-15352
>                 URL: https://issues.apache.org/jira/browse/HBASE-15352
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Nick Dimiduk
>             Fix For: 2.0.0, 0.98.19, 1.4.0
>
>
> We could improve on the existing [PREFIX_TREE 
> block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
>  encoder by upgrading the persistent data structure from a trie to a finite 
> state transducer. This would theoretically allow us to reuse bytes not just 
> for rowkey prefixes, but infixes and suffixes as well. My read of the 
> literature means we may also be able to encode values as well, further 
> reducing storage size when values are repeated (ie, a "customer id" field 
> with very low cardinality -- probably happens a lot in our denormalized 
> world). There's a really nice [blog 
> post|http://blog.burntsushi.net/transducers/] about this data structure, and 
> apparently our siblings in Lucene make heavy use of [their 
> implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to