Well, it's unclear what the remote lookup does... As Kafka Streams does not implement this part, my best guess at the moment is to blame it on a bug in the remote request implementation.

Are you using some of-the-shelf implementation for the remove lookup part or are you using something build in-house?


-Matthias



On 1/21/22 13:13, Jiří Syrový wrote:
I agree it sounds a bit off, but it seems that even a host that is not
marked as active allows me to query it's store and gives me a result that
is not null.

This application has an API that either queries local or remote
store (basically via HTTP API of active host), but the weird part is I get
the local response from both instances instead of expected one remote (on
non-active and non-standby host) and one local.
In principle the code to query stores looks like this

     streams
       .store(
         StoreQueryParameters
           .fromNameAndType(
             storeName,
             QueryableStoreTypes.keyValueStore[K, V]()
           )
       )
       .get(key)


And responses look like this:
$ curl <my_api>/<entity>/123 *(response from instance A)*
*{"meta":"KeyQueryMetadata
{activeHost=HostInfo{host='ip-1-2-3-4.eu-west-1.compute.internal',
port=8080}, standbyHosts=[],
partition=0}","value":{"timestamp":"2022-01-21T00:00:02.433Z","enabled":true},"hostname":"ip-1-2-3-4.eu-west-1.compute.internal"}*
$ curl <my_api>/<entity>/123 *(response from instance B)*
*{"meta":"KeyQueryMetadata
{activeHost=HostInfo{host='ip-1-2-3-4.eu-west-1.compute.internal',
port=8080}, standbyHosts=[],
partition=0}","value":{"timestamp":"2022-01-21T15:55:27.807Z","enabled":false},"hostname":"ip-9-8-7-6.eu-west-1.compute.internal"}*

This behaviour is not random and is 100% reproducible. I can try to create
a minimal code example that will demonstrate it.

On Fri, 21 Jan 2022 at 18:20, Matthias J. Sax <mj...@apache.org> wrote:

but instance A returns
result X for a partition I and instance B returns result Y for the same
partition I.

This sound a little off. As you stated, if both instances agree on the
active host, the active host must either be instance A or instance B,
and thus you can query partition I only on instance A or instance B. The
non-active instance should not return any data for a partition it does
not host.

Can you elaborate?

-Matthias

On 1/21/22 4:47 AM, Jiří Syrový wrote:
Hi everyone,

I'm trying for a while to answer myself a question about what are
actually
guarantees for state stores in regards to consistency when connected to
transformers.

I have an application where a single (persistent, rocksdb backed) state
store is connected to multiple transformers. Each transformer might both
read (get) and write (put) data into the state store. All transformers
receive data from multiple input topics in the same way (the same key,
same
number of partitions) that before sending it to transformers merged
together.

All transformers are located in the same sub-topology.

What I observed is that even with 0 standby replicas I might get
inconsistent results when querying this state store connected to multiple
transformers. I have 2 instances and metadata on both instances agree on
the active host for this state store and partition, but instance A
returns
result X for a partition I and instance B returns result Y for the same
partition I.

Any suggestions if this is a bug or is my assumption incorrect that the
same state store should give the same result for the same key (same
partition) in 2 distinct transformers fed from the same input?

Thanks,
Jiri



Reply via email to