[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454077#comment-13454077
 ] 

Sijie Guo commented on BOOKKEEPER-400:
--------------------------------------

besides that bug,

1) I found that there are lots of changing ensembles not due to bookie failure, 
but due to read timeout. I am just curious what kind of hardware you used for 
bookkeeper, especially how many disks for each bookie.

2) (it is irrelative topic for this issue) I found that there are lots of 
unexpected response for adding same entry. It is because we just using the 
combination of ledgerid and entryid to identify a request. so a later retry 
request will overwrite the previous request, there would be no callback to 
execute for one when these two concurrent request are both on-the-fly. it is 
same as BOOKKEEPER-49. I think we need to introduce some kind of txn-id for a 
bookie request (like what hedwig request does) to distinguish different 
requests for same entries when fixing BOOKKEEPER-49 (I would start the work for 
BOOKKEEPER-49). 
                
> Ledger entry not found in any of the bookies in the ensemble responsible for 
> that entry.
> ----------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-400
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-400
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>            Reporter: Aniruddha
>         Attachments: clean.log.gz
>
>
> Detailed discussion at 
> http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3cCAOLhyDQzrmeOHmTxzPikeAqJ7pZUn0=vHfd=gc1srmtuye5...@mail.gmail.com%3e
> We had an internal discussion about this. From BOOKKEEPER-337, it seems that 
> handleBookieFailure could be invoked in parallel by a thread other the one 
> that calls LedgerHandle#sendAddSuccessCallbacks. The values updated by 
> handleBookieFailure might not be visible to the thread running 
> sendAddSuccessCallbacks because the fields are not volatile and this might 
> have caused our bad state. 
> BK-337 synchronizes access to metadata.addEnsemble() and we believe this 
> would make this scenario very improbable. 
> A long term fix might be to make LedgerMetadata immutable since it is rarely 
> updated. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to