hey, sorted = slower, randomized = faster.
this is because if it is sorted in natural key order, you tend to hotspot in 1 or 2 regions. I don't use table output format, I use direct commits from the map, no reduce. That seems to be the most performance solution. have fun! On Thu, Apr 2, 2009 at 1:36 PM, Stuart White <[email protected]>wrote: > On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson <[email protected]> wrote: > > The last thing - success should not be a function of sort order. > > > > However, speed will be related. > > How? Sorted = faster, or Sorted = slower? > > > > > One thing I found I had to do was: > > private void doCommit(HTable t, BatchUpdate update) throws IOException > { > > boolean commited = false; > > while (!commited) { > > try { > > t.commit(update); > > commited = true; > > } catch (RetriesExhaustedException e) { > > // DAMN, ignore > > } > > } > > } > > > > I'm running a mapred job, using TableOutputFormat to write the results > to HBase. For the code you've provided, was that for a custom output > format? Or a standalone (non-mapred) application? I see the point > you're making, I just don't understand where I'd put that code. > Thanks! >
