[
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566737#comment-13566737
]
nkeywal commented on HBASE-7590:
--------------------------------
After more thinking about it:
So if I recapitulate the proposal:
- the master sends the dead servers list, with no more than 1 message per 10
seconds
- clients (including regionservers) uses this information to stop using a
server identified as dead
- so when they have a huge timeout, because they are doing a slow operation
server side, they check that the server is still there.
Advantages
- makes slow operation easier to manage
- less false positive client side (the master is the reference)
- clients reacts immediately when the server is actually dead.
- it's optional.
For the implementation, there are 3 options
Option Multicast:
No subscribe: the client listen on the right ip.
Easy backup for the master: the active master uses the right ip.
Option Do it yourself in UDP but without multicast
It seems that this would require:
- client starts to listen on a port, them contact the master.
- so client register themselves on the master: it means they connect to the
master when they start
- master needs to watch if the client is still there (or client need to
resubscribe every 10 minutes or so)
- if the master fails, the client needs to resubscribe (or the state must be
put in ZK).
Option ZK:
Multiple implementation choices. One of them is to put the protobuf message in
a znode that can be watched by the clients if they wish.
If so:
- no direct dependency from client to master.
- will benefit from local session when available
- every client has a permanent tcp connection to ZK.
- ZK receives/resent one message every 10s or so
- as usual, on each event, the client needs to watch again.
I don't think that the second option (do it yourself) is very good.
It's not uncommon to have two options for such features, for the people who
don't want multicast. My personal choice would be Multicast, then ZK.
Writing a znode every 10 seconds is not perfect (I guess ZK devs would say ZK
is not built for this), but should be manageable, even if the message size is
around 1Kb. ZK would do for small clusters without new config required,
multicast for large.
I could start with ZK, but I don't really like the idea of writing something
that would not scale. So in terms of timeframe my preference would be to go for
the multicast first...
Thoughts?
> Add a costless notifications mechanism from master to regionservers & clients
> -----------------------------------------------------------------------------
>
> Key: HBASE-7590
> URL: https://issues.apache.org/jira/browse/HBASE-7590
> Project: HBase
> Issue Type: Bug
> Components: Client, master, regionserver
> Affects Versions: 0.96.0
> Reporter: nkeywal
>
> t would be very useful to add a mechanism to distribute some information to
> the clients and regionservers. Especially It would be useful to know globally
> (regionservers + clients apps) that some regionservers are dead. This would
> allow:
> - to lower the load on the system, without clients using staled information
> and going on dead machines
> - to make the recovery faster from a client point of view. It's common to use
> large timeouts on the client side, so the client may need a lot of time
> before declaring a region server dead and trying another one. If the client
> receives the information separatly about a region server states, it can take
> the right decision, and continue/stop to wait accordingly.
> We can also send more information, for example instructions like 'slow down'
> to instruct the client to increase the retries delay and so on.
> Technically, the master could send this information. To lower the load on
> the system, we should:
> - have a multicast communication (i.e. the master does not have to connect to
> all servers by tcp), with once packet every 10 seconds or so.
> - receivers should not depend on this: if the information is available great.
> If not, it should not break anything.
> - it should be optional.
> So at the end we would have a thread in the master sending a protobuf message
> about the dead servers on a multicast socket. If the socket is not
> configured, it does not do anything. On the client side, when we receive an
> information that a node is dead, we refresh the cache about it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira