[
https://issues.apache.org/jira/browse/BOOKKEEPER-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730866#comment-13730866
]
Matteo Merli commented on BOOKKEEPER-665:
-----------------------------------------
I understand the goal of BOOKKEEPR-537, although to me seems more critical to
avoid the potential 2s timeout on read than to survive a complete ZK
unavailability.
I am saying this, because:
# Having read/write on current open ledgers, would just buy a few seconds
before a new ledger would need be created triggering an error.
# Our application uses ZK anyway to have locks on shared resources, so if ZK is
down we need to shutdown anyway to avoid conflicts (I admit this might not be
true for everyone)
# Observing prod systems, the complete loss of the ZK quorum is usually due to
:
* Network hardware failures
* Planned hardware maintenance (moving racks with zk servers inside, ...)
All these scenarios are usually requiring manual intervention and the downtime
would be measured in hours rather than just a few seconds.
{quote}
as I said, I am -1 on skipping solution. you possibly could reorder the read
sequence to try read from bookies that available in zookeeper, but you need to
take care since the checking is in the critical path that each read would
access the check. an experiment result would help convincing this change.
{quote}
Ok, I first thought of leaving the reads from non-available bookies as the last
resource, but the change looked complicated. I'll give it another try and will
update here.
{quote}
I don't quite get this. if this bookie is a failed one, should automatic
replication replicate entries from other hosts rather than the failed one? if
not, properly a change is need on replication part not on normal part, right?
And for logs problems, shall we review the logging part?
{quote}
Yes, this is probably better to be separated from this jira. Just as a quick
overview, automatic replication is currently trying to read a few entries from
all the bookies in the ensemble to verify the replication status of a segment.
That obviously print many error messages since it never succeeds to connect to
the "failed" bookie, which was the reason we initiated the auto-replication in
the first place.
> BK client should not try to read entries from non-available bookies
> -------------------------------------------------------------------
>
> Key: BOOKKEEPER-665
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-665
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Matteo Merli
> Assignee: Matteo Merli
> Priority: Minor
> Attachments: BOOKKEEPER-665.patch
>
>
> If a bookie is not in the available list, we shouldn't try to read from it
> but just treat the read from that replica as failed.
> This could be especially true if the bookie node is partitioned because that
> could mean we need to wait the connection timeout. Also during the
> auto-replication of ledgers most of the logs consist of errors that say it
> was not possible to read from the failed bookie.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira