Yes, for num_versions > 1, HBase has to dig through the memcache, and multiple HStore files until it has found the requested number of versions or runs out of places to look. This is especially apparent if there is only 1 version. It has to do a lot of work for nothing.
Please enter a Jira for the HBase shell to default the number of versions to 1. --- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED] > -----Original Message----- > From: Stu Hood [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 06, 2007 11:23 PM > To: [email protected] > Subject: HBase num_versions > > Hey guys, > > Just noticed some surprising behavior for select statements > in HBase 0.15: a select command without a num_versions = 1 > clause takes 2 orders of magnitude longer to run than a bare select. > > Is this inconsistent implementation, or is it taking extra > time to scan for additional versions? If this isn't a bug, > then perhaps the default for num_versions should be 1 to keep > things snappy by default. > > ============================================================ > > Hbase> describe test; > +------------------------------------------------------------- > ----------------+ > | Column Family Descriptor > | > +------------------------------------------------------------- > ----------------+ > | name: hex, max versions: 3, compression: NONE, in memory: > false, max length:| > | 2147483647, bloom filter: none > | > +------------------------------------------------------------- > ----------------+ > 1 columnfamily(s) in set (0.310 sec) > Hbase> select hex: from test where row = '3980000' num_versions = 1; > 3cbae0 > 1 row(s) in set (0.016 sec) > Hbase> select hex: from test where row = '3980000'; > 3cbae0 > 1 row(s) in set (1.882 sec) > > ============================================================ > > > Thanks, > > > Stu Hood > Webmail.us > "You manage your business. We'll manage your email."(r) > >
