I made a modification of latest Confluent's example UserRegionLambdaExample. See relevant code at end of email.
Am I correct in understanding that KTable semantics should be similar to a store-backed cache of a view as (per wikipedia on materialized views) or similar to Oracle's materialized views and indexed views? More specifically, I am looking at when a (key, null value) pair can make it into KTable on generating table from a valid KStream with a false filter. Here's relevant code modified from example for which I observed that all keys within userRegions are sent out to topic LargeRegions with a null value. I would think that both regionCounts KTable and topic LargeRegions should be empty so that the cached view agrees with the intended query (a query with an intentional empty result set as the filter is intentionally false as 1 >= 2). I am not sure I understand implications properly as I am new but it seems possible that a highly selective filter from a large incoming stream would result in high memory usage for regionCounts and hence the stream application. KTable<String, *String*> regionCounts = userRegions // Count by region // We do not need to specify any explicit serdes because the key and value types do not change .groupBy((userId, region) -> KeyValue.pair(region, region)) .count("CountsByRegion") // discard any regions FOR SAKE OF EXAMPLE .filter((regionName, count) -> *1 >= 2*) .mapValues(count -> count.toString()); KStream<String, *String*> regionCountsForConsole = regionCounts.toStream(); regionCountsForConsole.to(stringSerde, *stringSerde*, "LargeRegions");