Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hbase/SecondaryIndexing" page has been changed by jgray. http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing?action=diff&rev1=2&rev2=3 -------------------------------------------------- The big open questions are around how to deal with the WAL and replay. - The secondary table could be offline because of another RS failure, so we may have long-waiting secondary updates. How can we guarantee all secondary updates are applied when evicting an old HLog? Ideally we want to avoid over-exploiting operations being idempotent and not just aggressively reapplying everything. Do we need to keep track of each HLog and it's pending secondary updates and prevent log eviction until they are done? + The secondary table could be offline because of another RS failure, so we may have long-waiting secondary updates. How can we guarantee all secondary updates are applied when evicting an old HLog? Also, we want to avoid over-exploiting operations being idempotent and not just aggressive reapplying everything. - Should the workers applying secondary edits write back into the WAL that the edit is complete (and thus durable on the other server so does not need to be replayed if this one fails over)? Or we could tie secondary edits to each memstore, and the flushing of a memstore can only happen if its secondary edits have all been applied, which would tie in with the existing semantics around log eviction... but that has other implications. + Do we need to keep track of each HLog and it's pending secondary updates and prevent log eviction until they are done? + + Or should the workers applying secondary edits write back into the WAL that the edit is complete (and thus durable on the other server so does not need to be replayed if this one fails over)? + + Or we could tie secondary edits to each memstore, and the flushing of a memstore can only happen if its secondary edits have all been applied, which would tie in with the existing semantics around log eviction... but that has other implications and won't really help with preventing too much over replay. Other open questions:
