pvillard31 commented on code in PR #11355:
URL: https://github.com/apache/nifi/pull/11355#discussion_r3455362696


##########
nifi-extension-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-restapi-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java:
##########
@@ -267,6 +273,11 @@ public class PutElasticsearchJson extends 
AbstractPutElasticsearch {
             .name("Index Field")
             .description("""
                     The name of the field within each document to use as the 
Elasticsearch index name. \
+                    A nested field can be referenced with a "/"-delimited 
path: for a document \

Review Comment:
   The escape explanation is repeated in all three property descriptions. Could 
we move the full explanation into an additionalDetails.md for the processor and 
have each property description point to the processor documentation for the 
details instead?



##########
nifi-extension-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-restapi-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java:
##########
@@ -241,6 +241,11 @@ public class PutElasticsearchJson extends 
AbstractPutElasticsearch {
             .name("Identifier Field")
             .description("""
                     The name of the field within each document to use as the 
Elasticsearch document ID. \
+                    A nested field can be referenced with a "/"-delimited 
path: for a document \

Review Comment:
   These three properties shipped in released 2.10.0 with literal field name 
semantics, so an existing flow that has a slash in the configured value will 
now be read as a nested path after an upgrade. I am not an experienced 
Elasticsearch user and I do not know how common a slash is in these field 
names, and I agree the likelihood of a problematic value is low, but I think we 
should still handle this case safely with a property migration that escapes any 
existing slash or backslash, and add test coverage for that migration. Could we 
add that here?



##########
nifi-extension-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-restapi-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java:
##########
@@ -1143,30 +1177,226 @@ private String resolveId(final Map<String, Object> 
contentMap, final String idAt
     /**
      * Extracts the index name from a pre-parsed {@link JsonNode}.
      * Used for JSON Array Index/Create operations where the node is already 
available.
+     * The field may be a {@code /}-delimited path into nested objects.
      * Falls back to {@code fallbackIndex} when the field is absent or blank.
      */
     private String extractIndex(final JsonNode node, final String indexField, 
final String fallbackIndex) {
         if (StringUtils.isBlank(indexField)) {
             return fallbackIndex;
         }
-        final String value = fieldNodeToString(node.get(indexField));
+        final String value = fieldNodeToString(nodeAtPath(node, indexField));
         return StringUtils.isNotBlank(value) ? value : fallbackIndex;
     }
 
     /**
      * Resolves the index name from an already-parsed content Map.
      * Used for Update/Delete/Upsert operations and suppression-enabled 
Index/Create paths
-     * where the Map is already available. Falls back to {@code fallbackIndex} 
when the
-     * field is absent or blank.
+     * where the Map is already available. The field may be a {@code 
/}-delimited path into
+     * nested objects. Falls back to {@code fallbackIndex} when the field is 
absent or blank.
      */
     private String resolveIndex(final Map<String, Object> contentMap, final 
String indexField, final String fallbackIndex) {
         if (StringUtils.isBlank(indexField)) {
             return fallbackIndex;
         }
-        final String value = fieldValueToString(contentMap.get(indexField));
+        final String value = fieldValueToString(valueAtPath(contentMap, 
indexField));
         return StringUtils.isNotBlank(value) ? value : fallbackIndex;
     }
 
+    /**
+     * Whether {@code field} is a {@code /}-delimited nested path rather than 
a flat field name.
+     * A {@code /} preceded by a backslash is an escaped literal slash within 
a single field name
+     * (see {@link #splitPath(String)}) and does not make the field a nested 
path.
+     */
+    private static boolean isNestedPath(final String field) {
+        if (field == null) {
+            return false;
+        }
+        for (int i = 0; i < field.length(); i++) {
+            final char c = field.charAt(i);
+            if (c == '\\' && i + 1 < field.length()) {
+                final char next = field.charAt(i + 1);
+                if (next == '/' || next == '\\') {
+                    i++; // skip the escaped character
+                    continue;
+                }
+            } else if (c == '/') {
+                return true;
+            }
+        }
+        return false;
+    }
+
+    /**
+     * Splits a {@code /}-delimited field path into its decoded segments, 
ignoring empty segments
+     * produced by leading, trailing, or repeated slashes. A field name 
containing no unescaped
+     * slash yields a single segment equal to the (decoded) name itself.
+     * <p>
+     * A backslash escapes the following character so it is taken literally: 
{@code \/} is a slash
+     * that is part of a field name rather than a separator, and {@code \\} is 
a literal backslash.
+     * A backslash before any other character (or at the end of the string) is 
kept as a literal
+     * backslash, so field names containing stray backslashes are unaffected.
+     */
+    private static String[] splitPath(final String path) {
+        final List<String> segments = new ArrayList<>();
+        final StringBuilder current = new StringBuilder(path.length());
+        for (int i = 0; i < path.length(); i++) {
+            final char c = path.charAt(i);
+            if (c == '\\' && i + 1 < path.length()) {
+                final char next = path.charAt(i + 1);
+                if (next == '/' || next == '\\') {
+                    current.append(next);
+                    i++; // consume the escaped character
+                    continue;
+                }
+                current.append(c); // lone backslash, kept literal
+            } else if (c == '/') {
+                if (current.length() > 0) {
+                    segments.add(current.toString());
+                    current.setLength(0);
+                }
+            } else {
+                current.append(c);
+            }
+        }
+        if (current.length() > 0) {
+            segments.add(current.toString());
+        }
+        return segments.toArray(new String[0]);
+    }
+
+    /**
+     * Decodes a single flat field name, resolving the same backslash escapes 
as
+     * {@link #splitPath(String)} ({@code \/} to {@code /}, {@code \\} to 
{@code \}) without treating
+     * any slash as a separator. Returns the input unchanged when it contains 
no backslash.
+     */
+    private static String decodeSegment(final String field) {
+        if (field.indexOf('\\') < 0) {
+            return field;
+        }
+        final StringBuilder sb = new StringBuilder(field.length());
+        for (int i = 0; i < field.length(); i++) {
+            final char c = field.charAt(i);
+            if (c == '\\' && i + 1 < field.length()) {
+                final char next = field.charAt(i + 1);
+                if (next == '/' || next == '\\') {
+                    sb.append(next);
+                    i++;
+                    continue;
+                }
+            }
+            sb.append(c);
+        }
+        return sb.toString();
+    }
+
+    /**
+     * Returns the value at the {@code /}-delimited {@code path} within {@code 
map}, or {@code null}
+     * when any segment is missing or an intermediate segment is not a JSON 
object. A path with no
+     * slash is a direct top-level lookup (no allocation).
+     */
+    private static Object valueAtPath(final Map<String, Object> map, final 
String path) {
+        if (!isNestedPath(path)) {
+            return map.get(decodeSegment(path));
+        }
+        Object current = map;
+        for (final String segment : splitPath(path)) {
+            if (!(current instanceof Map)) {
+                return null;
+            }
+            current = ((Map<?, ?>) current).get(segment);
+            if (current == null) {
+                return null;
+            }
+        }
+        return current;
+    }
+
+    /**
+     * Returns the {@link JsonNode} at the {@code /}-delimited {@code path} 
within {@code node}, or
+     * {@code null} when any segment is missing or an intermediate segment is 
not a JSON object.
+     * A path with no slash is a direct field lookup.
+     */
+    private static JsonNode nodeAtPath(final JsonNode node, final String path) 
{
+        if (!isNestedPath(path)) {
+            return node.get(decodeSegment(path));
+        }
+        JsonNode current = node;
+        for (final String segment : splitPath(path)) {
+            if (current == null || !current.isObject()) {
+                return null;
+            }
+            current = current.get(segment);
+        }
+        return current;
+    }
+
+    /**
+     * Removes the leaf at the {@code /}-delimited {@code path} within {@code 
root}, then prunes any
+     * ancestor objects that become empty as a result (the transient envelope 
often used for routing
+     * metadata, e.g. {@code @metadata}, should not be indexed once its fields 
are extracted).
+     * Does nothing when the path is absent. A path with no slash is a direct 
top-level removal.
+     */
+    private static void removeAtPath(final Map<String, Object> root, final 
String path) {

Review Comment:
   The Map and ObjectNode versions of removeAtPath duplicate the same descend 
and prune loop. Could we factor the shared walk into one helper so the two 
cannot drift over time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to