Jason - are you feeding it that whole string for each date? Input data is 17 bytes per record * 50mm records = 850MB, and that reduces to 984 bytes? Is it possible to compress by that much? Maybe I'm missing something about how the FST works.
Matt On Fri, Jun 3, 2011 at 8:51 PM, Jason Rutherglen <jason.rutherg...@gmail.com > wrote: > Also the next thing to measure with the FST is the key lookup speed. > I'm not sure what that'd look like, or how to compare with HBase right > now? > > On Fri, Jun 3, 2011 at 8:42 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: > > Here's a nice preliminary number with the FST, 50 million dates of the > > form yyyyMMddHHmmssSSS, with each incremented by one millisecond. The > > FST is 984 bytes, with an incrementing long to point to the presumably > > MMap'd value data. This's a bit crazy. > > > > Perhaps we should try other increments as well? Given that HBase keys > > especially are probably close increments of each other, I think the > > FST can always be loaded into RAM with pointers out to the actual > > values. > > >