I would compare YCSB between the patched and unpatched version for a more realistic workload than the unit tests provide.
-Joey On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[email protected]> wrote: > Hello, > > Some words on the context: We're thinking about using HBase for a product > we're developping. For this reason, I am currently looking at HBase source > code to understand how to debug & modify HBase. To start with something > simple but useful, I am looking for performance improvement by profiling > hbase during the execution of the unit tests. I expect that many of the > hotspots found on the unit tests are also hotspots in real production. I > plan to spend around 10 m.d on this until september. > > > The method regionserver.MemStore/updateColumnValue seems quite used, and is > ultimatly responsible of 30% of the time in the test subsets I am using. > > > There is bit of it that can be optimized easily by changing the conditions > order: > > if (firstKv.matchingQualifier(kv)) { > if (kv.getType() == KeyValue.Type.Put.getCode()) { > now = Math.max(now, kv.getTimestamp()); > } > } > > becomes: > if (kv.getType() == KeyValue.Type.Put.getCode() && > kv.getTimestamp() > now && > firstKv.matchingQualifier(kv)) { > now = kv.getTimestamp(); > } > > As comparing the qualifier is much more expensive, we put it at the end. > It improve the performances by 3% (i.e: total execution time lowered by > 3%). > > > So first question: would you be interested by a patch for this kind of > stuff? > > > > Second question (more technical...): in this method > (regionserver.MemStore/updateColumnValue), I see: > > KeyValue firstKv = KeyValue.createFirstOnRow( > row, family, qualifier); > > [...] > while (it.hasNext()) { > KeyValue kv = it.next(); > > // if this isnt the row we are interested in, then bail: > if (!firstKv.matchingColumn(family, qualifier) || > !firstKv.matchingRow(kv)) { > break; // rows dont match, bail. > } > > [...] > } > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > 1) Why it is tested in the loop, as firstKv is not modified, the result > won't change. > 2) How the result can be 'false', as firstKv is inialized with the family > and the parameters. > > Or is it shared for update a way or another? > > If we can remove it, we gain another 2%... > > > N. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
