[Hadoop Wiki] Update of "Hbase/SecondaryIndexing" by jgray

Apache Wiki Mon, 28 Feb 2011 11:20:14 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hbase/SecondaryIndexing" page has been changed by jgray.
http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing?action=diff&rev1=2&rev2=3

--------------------------------------------------

  
  The big open questions are around how to deal with the WAL and replay.
  
- The secondary table could be offline because of another RS failure, so we may 
have long-waiting secondary updates.  How can we guarantee all secondary 
updates are applied when evicting an old HLog?  Ideally we want to avoid 
over-exploiting operations being idempotent and not just aggressively 
reapplying everything.  Do we need to keep track of each HLog and it's pending 
secondary updates and prevent log eviction until they are done?
+ The secondary table could be offline because of another RS failure, so we may 
have long-waiting secondary updates.  How can we guarantee all secondary 
updates are applied when evicting an old HLog?  Also, we want to avoid 
over-exploiting operations being idempotent and not just aggressive reapplying 
everything.
  
- Should the workers applying secondary edits write back into the WAL that the 
edit is complete (and thus durable on the other server so does not need to be 
replayed if this one fails over)?  Or we could tie secondary edits to each 
memstore, and the flushing of a memstore can only happen if its secondary edits 
have all been applied, which would tie in with the existing semantics around 
log eviction... but that has other implications.
+ Do we need to keep track of each HLog and it's pending secondary updates and 
prevent log eviction until they are done?
+ 
+ Or should the workers applying secondary edits write back into the WAL that 
the edit is complete (and thus durable on the other server so does not need to 
be replayed if this one fails over)?
+ 
+ Or we could tie secondary edits to each memstore, and the flushing of a 
memstore can only happen if its secondary edits have all been applied, which 
would tie in with the existing semantics around log eviction... but that has 
other implications and won't really help with preventing too much over replay.
  
  
  Other open questions:

[Hadoop Wiki] Update of "Hbase/SecondaryIndexing" by jgray

Reply via email to