[
https://issues.apache.org/jira/browse/BOOKKEEPER-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454077#comment-13454077
]
Sijie Guo commented on BOOKKEEPER-400:
--------------------------------------
besides that bug,
1) I found that there are lots of changing ensembles not due to bookie failure,
but due to read timeout. I am just curious what kind of hardware you used for
bookkeeper, especially how many disks for each bookie.
2) (it is irrelative topic for this issue) I found that there are lots of
unexpected response for adding same entry. It is because we just using the
combination of ledgerid and entryid to identify a request. so a later retry
request will overwrite the previous request, there would be no callback to
execute for one when these two concurrent request are both on-the-fly. it is
same as BOOKKEEPER-49. I think we need to introduce some kind of txn-id for a
bookie request (like what hedwig request does) to distinguish different
requests for same entries when fixing BOOKKEEPER-49 (I would start the work for
BOOKKEEPER-49).
> Ledger entry not found in any of the bookies in the ensemble responsible for
> that entry.
> ----------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-400
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-400
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Reporter: Aniruddha
> Attachments: clean.log.gz
>
>
> Detailed discussion at
> http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3cCAOLhyDQzrmeOHmTxzPikeAqJ7pZUn0=vHfd=gc1srmtuye5...@mail.gmail.com%3e
> We had an internal discussion about this. From BOOKKEEPER-337, it seems that
> handleBookieFailure could be invoked in parallel by a thread other the one
> that calls LedgerHandle#sendAddSuccessCallbacks. The values updated by
> handleBookieFailure might not be visible to the thread running
> sendAddSuccessCallbacks because the fields are not volatile and this might
> have caused our bad state.
> BK-337 synchronizes access to metadata.addEnsemble() and we believe this
> would make this scenario very improbable.
> A long term fix might be to make LedgerMetadata immutable since it is rarely
> updated.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira