[
https://issues.apache.org/jira/browse/BOOKKEEPER-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730245#comment-13730245
]
Sijie Guo commented on BOOKKEEPER-665:
--------------------------------------
{quote}
If zk service totally down, only an application that just uses a single (or a
few) ledger to read/write could possibly keep on working. I believe that in
most of the cases applications are constantly opening/creating ledgers, in
which case being able to operate without ZK will be of little help.
{quote}
the goal of BOOKKEEPER-537 is that applications should be able to read/write
regarding zookeeper failures, which is to reduce the dependency of zookeeper.
if you checked the branch that I pasted in BOOKKEEPER-537, you would find that
we would suspend all kind metadata operations when zookeeper is unavailable
(either zookeeper failure or network partition): e.g. suspend rolling ledgers
(as you said creating new ledgers) and keep using the old ledger, suspend
changing ensemble and retrying on last ensemble.
if we skip reading from bookies that unavailable in zookeeper, the whole system
doesn't work anymore when either zookeeper is down or session expired (network
partition, jvm gc issues).
{quote}
If a bookie is partition from the network and it's the 1st one in the ensemble
list, then every read to all the ledgers it contains will timeout in 2s, even
if we have perfectly fine copies of that.
{quote}
as I said, I am -1 on skipping solution. you possibly could reorder the read
sequence to try read from bookies that available in zookeeper, but you need to
take care since the checking is in the critical path that each read would
access the check. an experiment result would help convincing this change.
{quote}
Also, trying to read from dead bookies is especially annoying when doing
automatic replication, because it tries to read every fragment from the failed
bookie, filling the logs with 1000s of errors.
{quote}
I don't quite get this. if this bookie is a failed one, should automatic
replication replicate entries from other hosts rather than the failed one? if
not, properly a change is need on replication part not on normal part, right?
And for logs problems, shall we review the logging part?
> BK client should not try to read entries from non-available bookies
> -------------------------------------------------------------------
>
> Key: BOOKKEEPER-665
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-665
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Matteo Merli
> Assignee: Matteo Merli
> Priority: Minor
> Attachments: BOOKKEEPER-665.patch
>
>
> If a bookie is not in the available list, we shouldn't try to read from it
> but just treat the read from that replica as failed.
> This could be especially true if the bookie node is partitioned because that
> could mean we need to wait the connection timeout. Also during the
> auto-replication of ledgers most of the logs consist of errors that say it
> was not possible to read from the failed bookie.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira