jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r740468584



##########
File path: docs/development/extensions-core/lookups-cached-global.md
##########
@@ -204,7 +204,8 @@ The remapping values for each globally cached lookup can be 
specified by a JSON
         "\"value\"]"
       ]
   },
-  "pollPeriod":"PT5M"
+  "pollPeriod":"PT5M",
+  "maxSize": 5242880

Review comment:
       I think it'd make more sense to express this as a percentage of 
available heap instead of a fixed byte limit, since the amount of available 
memory can differ across the various nodes where lookups might be loaded

##########
File path: 
extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated 
over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes 
read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value 
types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning 
should be shown in the log
+   * @param name      The name of the map that is being populated. Used to 
identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       Are there cases where the value is not a string? If so, suggest adding 
support for those cases, this log seems likely to result in excessive logging 
since it's per entry




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to