Thank you both, the key point I was missing was that position files track
the offsets of the *source* topics, whereas the checkpoint file tracks the
offset(s) of the changelog topics.

Do you know if I need to include interface/API changes to internal
classes/interfaces (those under the
org.apache.kafka.streams.processor.internals package) in a KIP, or are they
considered implementation details?

Cheers,
Nick

On Sat, 12 Nov 2022 at 03:59, John Roesler <vvcep...@apache.org> wrote:

> Hi all,
>
> Just to clarify: there actually is a position file. It was a small detail
> of the IQv2 implementation to add it, otherwise a persistent store's
> position would be lost after a restart.
>
> Otherwise, Sophie is right on the money. The checkpoint refers to an
> offset in the changelog, while the position refers to offsets in the task's
> input topics topics. So they are similar in function and structure, but
> they refer to two different things.
>
> I agree that, given this, it doesn't seem like consolidating them (for
> example, into one file) would be worth it. It would make the code more
> complicated without deduping any information.
>
> I hope this helps, and look forward to what you're cooking up, Nick!
> -John
>
> On 2022/11/12 00:50:27 Sophie Blee-Goldman wrote:
> > Hey Nick,
> >
> > I haven't been following the new IQv2 work very closely so take this
> with a
> > grain of salt,
> > but as far as I'm aware there's no such thing as "position files" -- the
> > Position is just an
> > in-memory object and is related to a user's query against the state
> store,
> > whereas a
> > checkpoint file reflects the current state of the store ie how much of
> the
> > changelog it
> > contains.
> >
> > In other words while these might look like they do similar things, the
> > actual usage and
> > implementation of Positions vs checkpoint files is pretty much unrelated.
> > So I don't think
> > it would sense for Streams to try and consolidate these or replace one
> with
> > another.
> >
> > Hope this answers your question, and I'll ping John to make sure I'm not
> > misleading
> > you regarding the usage/intention of Positions
> >
> > Sophie
> >
> > On Fri, Nov 11, 2022 at 6:48 AM Nick Telford <nick.telf...@gmail.com>
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm trying to understand how StateStores work internally for some
> changes
> > > that I plan to propose, and I'd like some clarification around
> checkpoint
> > > files and position files.
> > >
> > > It appears as though position files are relatively new, and were
> created as
> > > part of the IQv2 initiative, as a means to track the position of the
> local
> > > state store so that reads could be bound by particular positions?
> > >
> > > Checkpoint files look much older, and are managed by the Task itself
> > > (actually, ProcessorStateManager). It looks like this is used
> exclusively
> > > for determining a) whether to restore a store, and b) which offsets to
> > > restore from?
> > >
> > > If I've understood the above correctly, is there any scope to
> potentially
> > > replace checkpoint files with StateStore#position()?
> > >
> > > Regards,
> > >
> > > Nick
> > >
> >
>

Reply via email to