DonalEvans commented on a change in pull request #6831:
URL: https://github.com/apache/geode/pull/6831#discussion_r702030694
##########
File path:
geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisSortedSet.java
##########
@@ -377,6 +382,25 @@ long zrevrank(byte[] member) {
return scoreSet.size() - scoreSet.indexOf(orderedSetEntry) - 1;
}
+ ImmutablePair<Integer, List<byte[]>> zscan(Pattern matchPattern, int count,
int cursor) {
+ // No need to allocate more space than it's possible to use given the size
of the sorted set. We
+ // need to add 1 to zcard() to ensure that if count > members.size(), we
return a cursor of 0
+ long maximumCapacity = 2L * Math.min(count, zcard() + 1);
+ if (maximumCapacity > Integer.MAX_VALUE) {
+ LogService.getLogger().info(
+ "The size of the data to be returned by zscan, {}, exceeds the
maximum capacity of an array. A value for the ZSCAN COUNT argument less than {}
should be used",
+ maximumCapacity, Integer.MAX_VALUE / 2);
+ throw new IllegalArgumentException("Requested array size exceeds VM
limit");
+ }
+ List<byte[]> resultList = new ArrayList<>((int) maximumCapacity);
+ do {
+ cursor = members.scan(cursor, 1,
Review comment:
The difficulty here is that Redis has different behaviour depending on
the size of the sorted set/hash. For sizes less than a certain amount, they
internally use a different data structure, which results in different behaviour:
>"When iterating Sets encoded as intsets (small sets composed of just
integers), or Hashes and Sorted Sets encoded as ziplists (small hashes and sets
composed of small individual values), usually all the elements are returned in
the first SCAN call regardless of the COUNT value."
from https://redis.io/commands/scan#the-count-option
But for larger sizes, a scan command may return no results (except a cursor
value) when used with MATCH and COUNT. I created a hash with ~500 entries,
three of which matched the pattern `a*`, and called HSCAN with `MATCH a*` and
various COUNT values. Redis returned an empty array for COUNT values up to 106,
and only one result for COUNT values up to 243, so it's not accurate to say
that the COUNT argument indicates the number of matching entries that should be
returned.
Given the inconsistent behaviour of Redis' scan commands, I'm not sure that
we want to try to exactly match behaviour for situations where a full iteration
isn't guaranteed, because our internal implementation will not behave the same
for small set/hash sizes. This can be tested by creating a hash with 20
entries, then calling HSCAN with a COUNT value of 5. Redis will return all
entries, but our implementation will return 5.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]