> However, imagine that the fenced message is only in the journal on b2,
> b2 crashes, something wipes the journal directory and then b2 comes
> back up.

In this case what happened?
1. We have WQ = 1
2. We had data loss (crash and comeup clean)

But yeah, in addition to dataloss we have fencing violation too.
The problem is not just wiped journal dir, but how we recognize the bookie.
Bookie is just recognized by its ip address, not by its incarnation.
Bookie1 at T1  (b1t1) ; and same bookie1 at T2 after bookie format (b1t2)
should be two different bookies, isn;t it?
this is needed for the replication worker and the auditor too.

Also, bookie needs to know if the writer/reader is intended to read from
b1t2 not from b1t1.
Looks like we have a hole here? Or I may not be fully understanding cookie
verification mechanism.

Also as Ivan pointed out, we appear to think the lack of journal is
implicitly a new bookie, but overall cluster doesn't differentiate between
incarnations.

Thanks,
JV





On Fri, Oct 6, 2017 at 8:46 AM, Ivan Kelly <iv...@apache.org> wrote:

> > The case you described here is "almost correct". But there is an key
> here:
> > B2 can't startup itself if journal disk is wiped out, because the cookie
> is
> > missed.
> This is what I expected to see, but isn't the case.
> <snip>
>       List<Cookie> journalCookies = Lists.newArrayList();
>             // try to read cookie from journal directory.
>             for (File journalDirectory : journalDirectories) {
>                 try {
>                     Cookie journalCookie =
> Cookie.readFromDirectory(journalDirectory);
>                     journalCookies.add(journalCookie);
>                     if (journalCookie.isBookieHostCreatedFromIp()) {
>                         conf.setUseHostNameAsBookieID(false);
>                     } else {
>                         conf.setUseHostNameAsBookieID(true);
>                     }
>                 } catch (FileNotFoundException fnf) {
>                     newEnv = true;
>                     missedCookieDirs.add(journalDirectory);
>                 }
>             }
> </snip>
>
> So if a journal is missing the cookie, newEnv is set to true. This
> disabled the later checks.
>
> > Hower it can still happen in a different case: bit flap. In your case, if
> > fence bit in b2 is already persisted on disk, but it got corrupted. Then
> it
> > will cause the issue you described. One problem is we don't have checksum
> > on the index file header when it stores those fence bits.
> Yes, this is also an issue.
>
> -Ivan
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to