[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6701: GEODE-9429: Allow Radish HSCAN to handle all valid values for COUNT

GitBox Mon, 19 Jul 2021 08:16:29 -0700


jdeppe-pivotal commented on a change in pull request #6701:
URL: https://github.com/apache/geode/pull/6701#discussion_r672392625




##########
File path: 
geode-apis-compatible-with-redis/src/main/java/org/apache/geode/redis/internal/data/RedisHash.java
##########
@@ -237,35 +237,32 @@ public int hstrlen(byte[] field) {
     return new ArrayList<>(hash.keySet());
   }
 
-  public ImmutablePair<Integer, List<byte[]>> hscan(Pattern matchPattern,
-      int count,
-      int cursor) {
-
-    ArrayList<byte[]> resultList = new ArrayList<>(count + 2);
+  public ImmutablePair<Integer, List<ImmutablePair<byte[], byte[]>>> 
hscan(Pattern matchPattern,
+      int count, int cursor) {
+    // No need to allocate more space than it's possible to use given the size 
of the hash
+    int initialCapacity = Math.min(count, hash.size());
+    List<ImmutablePair<byte[], byte[]>> resultList = new 
ArrayList<>(initialCapacity);
     do {
       cursor = hash.scan(cursor, 1,
           (list, key, value) -> addIfMatching(matchPattern, list, key, value), 
resultList);
-    } while (cursor != 0 && resultList.size() < (count * 2));
+    } while (cursor != 0 && resultList.size() < count);
 
     return new ImmutablePair<>(cursor, resultList);
   }
 
-  private void addIfMatching(Pattern matchPattern, List<byte[]> resultList, 
byte[] key,
-      byte[] value) {
+  private void addIfMatching(Pattern matchPattern, List<ImmutablePair<byte[], 
byte[]>> resultList,
+      byte[] key, byte[] value) {
     if (matchPattern != null) {
       if (matchPattern.matcher(bytesToString(key)).matches()) {
-        resultList.add(key);
-        resultList.add(value);
+        resultList.add(new ImmutablePair<>(key, value));

Review comment:
       Ah, OK I see. Out of curiosity I did some back-of-the-napkin tests and 
it seems like we'd need a VM with at least 80-100GB of heap to even hold enough 
data to hit this max (in the 'incorrect' implementation). And twice that for 
the 'correct' implementation. I'm really torn, but I do feel that it's more 
pragmatic to err on the side of performance than cater to a use case that seems 
highly unlikely (famous last words :) ). To think about it a different way, 
we'd always end up paying the cost of allocation just to avoid the incredibly 
unlikely error situation. (For which there is a solution - just request fewer 
than 2 billion keys at a time).
   
   Not to overly complicate this, but technically it shouldn't be difficult to 
optimise for the most frequent use-case but have a special case for when the 
requested count does exceed `Integer.MAX_VALUE / 2`. I'm not suggesting you do 
that now though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [geode] jdeppe-pivotal commented on a change in pull request #6701: GEODE-9429: Allow Radish HSCAN to handle all valid values for COUNT

Reply via email to