Aggreed. But there is a big advantage when you work on the issues found on the unit tesst: the code you're modifying is already covered by the unit tests... :-)
On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <[email protected]> wrote: > I would compare YCSB between the patched and unpatched version for a more > realistic workload than the unit tests provide. > > -Joey > > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[email protected]> wrote: > > > Hello, > > > > Some words on the context: We're thinking about using HBase for a product > > we're developping. For this reason, I am currently looking at HBase > source > > code to understand how to debug & modify HBase. To start with something > > simple but useful, I am looking for performance improvement by profiling > > hbase during the execution of the unit tests. I expect that many of the > > hotspots found on the unit tests are also hotspots in real production. I > > plan to spend around 10 m.d on this until september. > > > > > > The method regionserver.MemStore/updateColumnValue seems quite used, and > is > > ultimatly responsible of 30% of the time in the test subsets I am using. > > > > > > There is bit of it that can be optimized easily by changing the > conditions > > order: > > > > if (firstKv.matchingQualifier(kv)) { > > if (kv.getType() == KeyValue.Type.Put.getCode()) { > > now = Math.max(now, kv.getTimestamp()); > > } > > } > > > > becomes: > > if (kv.getType() == KeyValue.Type.Put.getCode() && > > kv.getTimestamp() > now && > > firstKv.matchingQualifier(kv)) { > > now = kv.getTimestamp(); > > } > > > > As comparing the qualifier is much more expensive, we put it at the end. > > It improve the performances by 3% (i.e: total execution time lowered by > > 3%). > > > > > > So first question: would you be interested by a patch for this kind of > > stuff? > > > > > > > > Second question (more technical...): in this method > > (regionserver.MemStore/updateColumnValue), I see: > > > > KeyValue firstKv = KeyValue.createFirstOnRow( > > row, family, qualifier); > > > > [...] > > while (it.hasNext()) { > > KeyValue kv = it.next(); > > > > // if this isnt the row we are interested in, then bail: > > if (!firstKv.matchingColumn(family, qualifier) || > > !firstKv.matchingRow(kv)) { > > break; // rows dont match, bail. > > } > > > > [...] > > } > > > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > > 1) Why it is tested in the loop, as firstKv is not modified, the result > > won't change. > > 2) How the result can be 'false', as firstKv is inialized with the > family > > and the parameters. > > > > Or is it shared for update a way or another? > > > > If we can remove it, we gain another 2%... > > > > > > N. > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
