jon-wei commented on a change in pull request #11863:
URL: https://github.com/apache/druid/pull/11863#discussion_r742421264



##########
File path: 
extensions-core/lookups-cached-global/src/main/java/org/apache/druid/data/input/MapPopulator.java
##########
@@ -101,9 +152,65 @@ public boolean processLine(String line)
           @Override
           public PopulateResult getResult()
           {
-            return new PopulateResult(lines, entries);
+            return new PopulateResult(lines, entries, bytes);
           }
         }
     );
   }
+
+  /**
+   * Read through the `iterator` and populate `map` with the data iterated 
over. Warning
+   * messages will be logged if the `byteLimit` > 0, and the number of bytes 
read into the map exceed the byte limit.
+   * Note: in order to compute the byte length properly, the key and value 
types of map must both be instances of
+   * String, otherwise no byte length is computed.
+   *
+   * @param iterator  The iterator to iterate over
+   * @param map       The map to populate
+   * @param byteLimit The limit of number of bytes after which a warning 
should be shown in the log
+   * @param name      The name of the map that is being populated. Used to 
identify the map in log messages written.
+   *
+   * @return number of entries parsed and bytes stored in the map.
+   */
+  public PopulateResult populateAndWarnAtByteLimit(
+      final Iterator<Pair<K, V>> iterator,
+      final Map<K, V> map,
+      final long byteLimit,
+      final String name)
+  {
+    int lines = 0;
+    int entries = 0;
+    long bytes = 0L;
+    long byteLimitMultiple = 1L;
+    while (iterator.hasNext()) {
+      Pair<K, V> pair = iterator.next();
+      if (0 < byteLimit) {
+        bytes += getByteLengthOfKeyAndValue(pair.lhs, pair.rhs);
+        if (bytes != 0 && (bytes > byteLimit * byteLimitMultiple)) {
+          LOG.warn("[%s] exceeded the byteLimit of [%,d]. Current bytes [%,d]",
+                   name,
+                   byteLimit,
+                   bytes
+          );
+          byteLimitMultiple++;
+        }
+      }
+      map.put(pair.lhs, pair.rhs);
+      entries++;
+    }
+    return new PopulateResult(lines, entries, bytes);
+  }
+
+  private long getByteLengthOfKeyAndValue(K key, V value)
+  {
+    if ((key instanceof String) && (value instanceof String)) {
+      return ((String) (key)).length() + ((String) (value)).length();
+    } else {
+      LOG.warn(

Review comment:
       can you adjust so that it doesn't potentially log on every value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to