Thank you both, the key point I was missing was that position files track the offsets of the *source* topics, whereas the checkpoint file tracks the offset(s) of the changelog topics.
Do you know if I need to include interface/API changes to internal classes/interfaces (those under the org.apache.kafka.streams.processor.internals package) in a KIP, or are they considered implementation details? Cheers, Nick On Sat, 12 Nov 2022 at 03:59, John Roesler <vvcep...@apache.org> wrote: > Hi all, > > Just to clarify: there actually is a position file. It was a small detail > of the IQv2 implementation to add it, otherwise a persistent store's > position would be lost after a restart. > > Otherwise, Sophie is right on the money. The checkpoint refers to an > offset in the changelog, while the position refers to offsets in the task's > input topics topics. So they are similar in function and structure, but > they refer to two different things. > > I agree that, given this, it doesn't seem like consolidating them (for > example, into one file) would be worth it. It would make the code more > complicated without deduping any information. > > I hope this helps, and look forward to what you're cooking up, Nick! > -John > > On 2022/11/12 00:50:27 Sophie Blee-Goldman wrote: > > Hey Nick, > > > > I haven't been following the new IQv2 work very closely so take this > with a > > grain of salt, > > but as far as I'm aware there's no such thing as "position files" -- the > > Position is just an > > in-memory object and is related to a user's query against the state > store, > > whereas a > > checkpoint file reflects the current state of the store ie how much of > the > > changelog it > > contains. > > > > In other words while these might look like they do similar things, the > > actual usage and > > implementation of Positions vs checkpoint files is pretty much unrelated. > > So I don't think > > it would sense for Streams to try and consolidate these or replace one > with > > another. > > > > Hope this answers your question, and I'll ping John to make sure I'm not > > misleading > > you regarding the usage/intention of Positions > > > > Sophie > > > > On Fri, Nov 11, 2022 at 6:48 AM Nick Telford <nick.telf...@gmail.com> > wrote: > > > > > Hi everyone, > > > > > > I'm trying to understand how StateStores work internally for some > changes > > > that I plan to propose, and I'd like some clarification around > checkpoint > > > files and position files. > > > > > > It appears as though position files are relatively new, and were > created as > > > part of the IQv2 initiative, as a means to track the position of the > local > > > state store so that reads could be bound by particular positions? > > > > > > Checkpoint files look much older, and are managed by the Task itself > > > (actually, ProcessorStateManager). It looks like this is used > exclusively > > > for determining a) whether to restore a store, and b) which offsets to > > > restore from? > > > > > > If I've understood the above correctly, is there any scope to > potentially > > > replace checkpoint files with StateStore#position()? > > > > > > Regards, > > > > > > Nick > > > > > >