Thanks for the clarification Jose, that clears my confusions already :)

Guozhang

On Thu, Oct 1, 2020 at 10:51 AM Jose Garcia Sancio <jsan...@confluent.io>
wrote:

> Thanks for the email Guozhang.
>
> > Thanks for the replies and the KIP updates. Just want to clarify one more
> > thing regarding my previous comment 3): I understand that when a snapshot
> > has completed loading, then we can use it in our handling logic of vote
> > request. And I understand that:
> >
> > 1) Before a snapshot has been completely received (e.g. if we've only
> > received a subset of the "chunks"), then we just handle vote requests "as
> > like" there's no snapshot yet.
> >
> > 2) After a snapshot has been completely received and loaded into main
> > memory, we can handle vote requests "as of" the received snapshot.
> >
> > What I'm wondering if, in between of these two synchronization barriers,
> > after all the snapshot chunks have been received but before it has been
> > completely parsed and loaded into the memory's metadata cache, if we
> > received a request (note they may be handled by different threads, hence
> > concurrently), what should we do? Or are you proposing that the
> > fetchSnapshot request would also be handled in that single-threaded raft
> > client loop so it is in order with all other requests, if that's the case
> > then we do not have any concurrency issues to worry, but then on the
> other
> > hand the reception of the last snapshot chunk and loading them to main
> > memory may also take long time during which a client may not be able to
> > handle any other requests.
>
> Yes. The FetchSnapshot request and response handling will be performed
> by the KafkaRaftClient in a single threaded fashion. The
> KafkaRaftClient doesn't need to load the snapshot to know what state
> it is in. It only needs to scan the "checkpoints" folder, load the
> quorum state file and know the LEO of the replicated log. I would
> modify 2) above to the following:
>
> 3) After the snapshot has been validated by
>   a) Fetching all of the chunks
>   b) Verifying the CRC of the records in the snapshot
>   c) Atomically moving the temporary snapshot to the permanent location
>
> After 3.c), the KafkaRaftClient only needs to scan and parse the
> filenames in the directory called "checkpoints" to find the
> largest/latest permanent snapshot.
>
> As you point out in 1) before 3.c) the KafkaRaftClient, in regards to
> leader election, will behave as if the temporary snapshot didn't
> exists.
>
> The loading of the snapshot will be done by the state machine (Kafka
> Controller or Metadata Cache) and it can perform this on a different
> thread. The KafkaRaftClient will provide an API for finding and
> reading the latest valid snapshot stored locally.
>
> Are you also concerned that the snapshot could have been corrupted after
> 3.c?
>
> I also updated the "Changes to leader Election" section to make this a
> bit clearer.
>
> Thanks,
> Jose
>


-- 
-- Guozhang

Reply via email to