jdeppe-pivotal commented on a change in pull request #6701:
URL: https://github.com/apache/geode/pull/6701#discussion_r672392625
##########
File path:
geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisHash.java
##########
@@ -237,35 +237,32 @@ public int hstrlen(byte[] field) {
return new ArrayList<>(hash.keySet());
}
- public ImmutablePair<Integer, List<byte[]>> hscan(Pattern matchPattern,
- int count,
- int cursor) {
-
- ArrayList<byte[]> resultList = new ArrayList<>(count + 2);
+ public ImmutablePair<Integer, List<ImmutablePair<byte[], byte[]>>>
hscan(Pattern matchPattern,
+ int count, int cursor) {
+ // No need to allocate more space than it's possible to use given the size
of the hash
+ int initialCapacity = Math.min(count, hash.size());
+ List<ImmutablePair<byte[], byte[]>> resultList = new
ArrayList<>(initialCapacity);
do {
cursor = hash.scan(cursor, 1,
(list, key, value) -> addIfMatching(matchPattern, list, key, value),
resultList);
- } while (cursor != 0 && resultList.size() < (count * 2));
+ } while (cursor != 0 && resultList.size() < count);
return new ImmutablePair<>(cursor, resultList);
}
- private void addIfMatching(Pattern matchPattern, List<byte[]> resultList,
byte[] key,
- byte[] value) {
+ private void addIfMatching(Pattern matchPattern, List<ImmutablePair<byte[],
byte[]>> resultList,
+ byte[] key, byte[] value) {
if (matchPattern != null) {
if (matchPattern.matcher(bytesToString(key)).matches()) {
- resultList.add(key);
- resultList.add(value);
+ resultList.add(new ImmutablePair<>(key, value));
Review comment:
Ah, OK I see. Out of curiosity I did some back-of-the-napkin tests and
it seems like we'd need a VM with at least 80-100GB of heap to even hold enough
data to hit this max (in the 'incorrect' implementation). And twice that for
the 'correct' implementation. I'm really torn, but I do feel that it's more
pragmatic to err on the side of performance than cater to a use case that seems
highly unlikely (famous last words :) ). To think about it a different way,
we'd always end up paying the cost of allocation just to avoid the incredibly
unlikely error situation. (For which there is a solution - just request fewer
than 2 billion keys at a time).
Not to overly complicate this, but technically it shouldn't be difficult to
optimise for the most frequent use-case but have a special case for when the
requested count does exceed `Integer.MAX_VALUE / 2`. I'm not suggesting you do
that now though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]