Current implementation to resolve conflicts on same snapshotEdge is
overwhelmingly complicated.
I am suggesting refactor this part so we can understand what is going on
for consistency.

More importantly, with positive fail probability for
RPC(hbase.fail.prob=0.01), test cases actually fail.
we can reproduce failure by running `sbt "project s2core"
-Dconfig.file=s2rest_play/conf/test.conf test`.
What we intended initially here is guarantee to resolve conflicts even some
RPC into storage fail.

I was fixing this issue and quickly realize current code base is way too
complicated and not efficient.

here is brief explanation(
https://steamshon.gitbooks.io/s2graph-book/content/request_state_diagram.html
).

the big picture for this process goes like below.

- retry on failure
  -- fetch snapshotEdge as oldSN.
    --- merge oldSN and requesting edges and build newSN containing updated
state.
    --- lock this snapshotEdge by storing (oldSN, requesting edge as
pendingEdge) using checkAndSet(statusCode 0 -> 1).
    --- fire RPCs for indexEdge insert + delete mutations(statusCode 1 ->
2).
    --- fire RPCs for indexEdge degree increment mutations(statusCode 2 ->
3).
    --- release lock on this snapshotEdge by storing (newSN, None as
pendingEdge) using CheckAndSet(statusCode 3 -> 0).

I think there is no need to use CheckAndSet on release lock phase since it
is guaranteed for only one thread can success to lock snapshotEdge.

Also fetch snapshotEdge on ever retry produce lots of RPC on same row.
Actually, I think there is no need to fetch snapshotEdge over and over on
every retry, once this snapshotEdge is locked.

Of course threads that failed to acquire lock, still need to fetch
snapshotEdge on retry, but thread who own lock on this snapshotEdge do not
need to fetch it again. it is guaranteed that no other thread changed
snapshotEdge while thread who own lock is retrying on partial failure.

Also current implementation use Thread.sleep on Future and throw exception
inside Future(not return Future.failed(ex)) which does not seems normal in
asynchronous manner.

Finally there is no comments and it is very hard to follow control flow for
this resolving logic.

I think it is worthwhile to refactor conflict resolving logic.

What do you guys think?

Best Regards.
DOYUNG YOON

Reply via email to