Peter Davis commented on KAFKA-5285:

Noting that after upgrading from to 1.0.1 today, I'm seeing severely 
degraded performance of `(ReadOnly)SessionStore.fetch(key)` as well.  Before we 
were only seeing the problem with `fetch(from,to)`.  Browsed the source code 
and I didn't immediately see what changed between 0.11 and 1.0 there.  (Another 
guess is it's a subtle side effect of some other change like perhaps 
https://issues.apache.org/jira/browse/KAFKA-4868 resulting in different 
compacted DB levels somehow?)

Anyway, workaround for me is to use `findSessions(key, 0, 
System.currentTimeMillis() + <some reasonable time in the future>)`, since the 
0x00 bytes in a timestamp < Long.MAX_VALUE yield a few extra usable bytes of 
maxKey prefix.

Both `ReadOnlySessionStore.fetch(...)` variants are entirely unusable for me at 
this time.

> Without any additional information about the key length or or the lower 
> bound, we can only assume that keys are at least 1 byte, and that byte has to 
> be smaller or equal to the first byte of keyTo (i.e. our upper bound has to 
> start with the first byte of keyTo), so our best guess for and upper bound in 
> that case is ADFFF.

Doing a range query with *one byte* of prefix will never give acceptable 
performance for any database with more than 8 keys(!), or in use cases where 
key prefixes are not randomly distributed (common in business applications).

May I suggest a few options, not mutually exclusive, but in order of preference:

1. Optimize where fromKey and toKey are the same or have a common prefix.  
(Isn't that your minimum key length right there?  I'm not really sure I 
understand why it's not just this simple.  Note, this is the only case I 
personally care about.)

2. Deprecate the `fetch` variants in favor of `findSessions`, and document that 
using max=Long.MAX_VALUE is not recommended.  Promote findSessions to 
ReadOnlySessionStore.  (This at least gives a few more bytes of usable key 

3. Configuration for default timeStartLatest = currentTimeMillis() + 
<reasonable offset like 1 day>.  (Same benefit as #2)

4. Configure minimum key length.  I don't like this because if natural keys are 
used (user names, human-readable business object references like "file number", 
etc.) then there isn't necessarily a good minimum key length that can be 
enforced by the application.  And if there were, it'd likely vary by store, 
raising the question of how do you easily configure per-store configs.

> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>                 Key: KAFKA-5285
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5285
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Xavier Léauté
>            Assignee: Guozhang Wang
>            Priority: Major
>              Labels: performance
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} 
> {{upperRange}} and {{lowerRange}} does not make any assumptions with respect 
> to the other key bound (e.g. the upper byte bound does not depends on lower 
> key bound).
> It should be possible to optimize the byte range somewhat further using the 
> information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to 
> eliminate the corresponding {{upperRangeFixedSize}} and 
> {{lowerRangeFixedSize}}, since the result should be the same if we implement 
> that optimization.

This message was sent by Atlassian JIRA

Reply via email to