[ https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369654#comment-16369654 ]
Peter Davis commented on KAFKA-5285: ------------------------------------ Noting that after upgrading from 0.11.0.1 to 1.0.1 today, I'm seeing severely degraded performance of `(ReadOnly)SessionStore.fetch(key)` as well. Before we were only seeing the problem with `fetch(from,to)`. Browsed the source code and I didn't immediately see what changed between 0.11 and 1.0 there. (Another guess is it's a subtle side effect of some other change like perhaps https://issues.apache.org/jira/browse/KAFKA-4868 resulting in different compacted DB levels somehow?) Anyway, workaround for me is to use `findSessions(key, 0, System.currentTimeMillis() + <some reasonable time in the future>)`, since the 0x00 bytes in a timestamp < Long.MAX_VALUE yield a few extra usable bytes of maxKey prefix. Both `ReadOnlySessionStore.fetch(...)` variants are entirely unusable for me at this time. > Without any additional information about the key length or or the lower > bound, we can only assume that keys are at least 1 byte, and that byte has to > be smaller or equal to the first byte of keyTo (i.e. our upper bound has to > start with the first byte of keyTo), so our best guess for and upper bound in > that case is ADFFF. Doing a range query with *one byte* of prefix will never give acceptable performance for any database with more than 8 keys(!), or in use cases where key prefixes are not randomly distributed (common in business applications). May I suggest a few options, not mutually exclusive, but in order of preference: 1. Optimize where fromKey and toKey are the same or have a common prefix. (Isn't that your minimum key length right there? I'm not really sure I understand why it's not just this simple. Note, this is the only case I personally care about.) 2. Deprecate the `fetch` variants in favor of `findSessions`, and document that using max=Long.MAX_VALUE is not recommended. Promote findSessions to ReadOnlySessionStore. (This at least gives a few more bytes of usable key prefix.) 3. Configuration for default timeStartLatest = currentTimeMillis() + <reasonable offset like 1 day>. (Same benefit as #2) 4. Configure minimum key length. I don't like this because if natural keys are used (user names, human-readable business object references like "file number", etc.) then there isn't necessarily a good minimum key length that can be enforced by the application. And if there were, it'd likely vary by store, raising the question of how do you easily configure per-store configs. > Optimize upper / lower byte range for key range scan on windowed stores > ----------------------------------------------------------------------- > > Key: KAFKA-5285 > URL: https://issues.apache.org/jira/browse/KAFKA-5285 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Xavier Léauté > Assignee: Guozhang Wang > Priority: Major > Labels: performance > > The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} > {{upperRange}} and {{lowerRange}} does not make any assumptions with respect > to the other key bound (e.g. the upper byte bound does not depends on lower > key bound). > It should be possible to optimize the byte range somewhat further using the > information provided by the lower bound. > More specifically, by incorporating that information, we should be able to > eliminate the corresponding {{upperRangeFixedSize}} and > {{lowerRangeFixedSize}}, since the result should be the same if we implement > that optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)