gianm commented on code in PR #19072:
URL: https://github.com/apache/druid/pull/19072#discussion_r2876447492


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -268,4 +312,87 @@ public <T> T as(Class<T> clazz)
     }
     return null;
   }
+
+  @VisibleForTesting
+  public static class FieldsFixupIndexed implements Indexed<ByteBuffer>
+  {
+    private final Indexed<ByteBuffer> delegate;
+    private final Int2ObjectMap<ByteBuffer> fixup;
+
+    private FieldsFixupIndexed(Indexed<ByteBuffer> delegate, 
Int2ObjectMap<ByteBuffer> fixup)
+    {
+      this.delegate = delegate;
+      this.fixup = fixup;
+    }
+
+    @Override
+    public int size()
+    {
+      return delegate.size();
+    }
+
+    @Nullable
+    @Override
+    public ByteBuffer get(int index)
+    {
+      if (fixup.containsKey(index)) {
+        return fixup.get(index).asReadOnlyBuffer();
+      }
+      return delegate.get(index);
+    }
+
+    @Override
+    public int indexOf(@Nullable ByteBuffer value)
+    {
+      for (Int2ObjectMap.Entry<ByteBuffer> entry : fixup.int2ObjectEntrySet()) 
{
+        if (entry.getValue().equals(value)) {
+          return entry.getIntKey();
+        }
+      }
+      return delegate.indexOf(value);
+    }
+
+    @Nonnull
+    @Override
+    public Iterator<ByteBuffer> iterator()
+    {
+      return new Iterator<>()
+      {
+        int pos = 0;
+        final int size = delegate.size();
+        final Iterator<ByteBuffer> delegateIterator = delegate.iterator();
+
+        @Override
+        public boolean hasNext()
+        {
+          return pos < size;
+        }
+
+        @Override
+        public ByteBuffer next()
+        {
+          if (fixup.containsKey(pos)) {
+            // move delegate iterator forward, but we're going to return our 
own value
+            delegateIterator.next();
+            // this is sad, but downstream stuff wants ByteBuffer, and is less 
sad than the original bug
+            return fixup.get(pos++).asReadOnlyBuffer();
+          }
+          pos++;
+          return delegateIterator.next();
+        }
+      };
+    }
+
+    @Override
+    public boolean isSorted()
+    {
+      return delegate.isSorted();

Review Comment:
   Is this always going to be true (will the fixed-up keys sort the same way as 
the bad keys)? Also, does it matter (does anything care if this indexed is 
sorted)?



##########
processing/src/main/java/org/apache/druid/segment/serde/NestedCommonFormatColumnPartSerde.java:
##########
@@ -87,7 +88,8 @@ public static NestedCommonFormatColumnPartSerde 
createDeserializer(
       @JsonProperty("enforceLogicalType") boolean enforceLogicalType,
       @JsonProperty("byteOrder") ByteOrder byteOrder,
       @JsonProperty("bitmapSerdeFactory") BitmapSerdeFactory 
bitmapSerdeFactory,
-      @JsonProperty("columnFormatSpec") @Nullable FormatSpec columnFormatSpec
+      @JsonProperty("columnFormatSpec") @Nullable FormatSpec columnFormatSpec,
+      @JsonProperty("pathParserVersion") @Nullable Byte pathParserVersion

Review Comment:
   I don't see a `@JsonProperty` getter for this. Is it going to make it into 
the serialized form?



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -268,4 +312,87 @@ public <T> T as(Class<T> clazz)
     }
     return null;
   }
+
+  @VisibleForTesting
+  public static class FieldsFixupIndexed implements Indexed<ByteBuffer>
+  {
+    private final Indexed<ByteBuffer> delegate;
+    private final Int2ObjectMap<ByteBuffer> fixup;
+
+    private FieldsFixupIndexed(Indexed<ByteBuffer> delegate, 
Int2ObjectMap<ByteBuffer> fixup)
+    {
+      this.delegate = delegate;
+      this.fixup = fixup;
+    }
+
+    @Override
+    public int size()
+    {
+      return delegate.size();
+    }
+
+    @Nullable
+    @Override
+    public ByteBuffer get(int index)
+    {
+      if (fixup.containsKey(index)) {
+        return fixup.get(index).asReadOnlyBuffer();
+      }
+      return delegate.get(index);
+    }
+
+    @Override
+    public int indexOf(@Nullable ByteBuffer value)
+    {
+      for (Int2ObjectMap.Entry<ByteBuffer> entry : fixup.int2ObjectEntrySet()) 
{
+        if (entry.getValue().equals(value)) {
+          return entry.getIntKey();
+        }
+      }
+      return delegate.indexOf(value);

Review Comment:
   Should return this only if the key is not present in `fixup`. Otherwise it 
will return a nonnegative index for the bad keys.



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedPathFinder.java:
##########
@@ -173,7 +177,105 @@ public static List<NestedPathPart> 
parseJsonPath(@Nullable String path)
           badFormatJsonPath(path, "closing single-quote (') must immediately 
precede ']'");
         }
 
-        parts.add(new NestedPathField(getPathSubstring(path, partMark, i)));
+        parts.add(new NestedPathField(path.substring(partMark, i)));
+        dotMark = -1;
+        quoteMark = -1;
+        // chomp to next char to eat closing array
+        if (++i == path.length()) {
+          break;
+        }
+        partMark = i + 1;
+        arrayMark = -1;
+      }
+    }
+    // add the last element, this should never be an array because they close 
themselves
+    if (partMark < path.length()) {
+      if (quoteMark != -1) {
+        badFormatJsonPath(path, "unterminated single-quote (')");
+      }
+      if (arrayMark != -1) {
+        badFormatJsonPath(path, "unterminated '['");
+      }
+      parts.add(new NestedPathField(path.substring(partMark)));
+    }
+
+    return parts;
+  }
+
+  /**
+   * split a JSONPath path into a series of extractors to find things in 
stuff. This method is mostly a duplicate of

Review Comment:
   Surely something can make more sense then "find things in stuff".



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -182,6 +194,38 @@ public static NestedDataColumnSupplier read(
     }
   }
 
+
+  @VisibleForTesting
+  static Supplier<? extends Indexed<ByteBuffer>> getAndFixFieldsSupplier(

Review Comment:
   Should have some javadoc explaining what "fix" means. Having a short 
description and then linking to a GitHub issue or PR for more details is a 
useful way to do it.



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -268,4 +312,87 @@ public <T> T as(Class<T> clazz)
     }
     return null;
   }
+
+  @VisibleForTesting
+  public static class FieldsFixupIndexed implements Indexed<ByteBuffer>

Review Comment:
   Similar to the static creator method, this should have some javadoc 
explaining what "fix" means. Having a short description and then linking to a 
GitHub issue or PR for more details is a useful way to do it.



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedPathFinder.java:
##########
@@ -173,7 +177,105 @@ public static List<NestedPathPart> 
parseJsonPath(@Nullable String path)
           badFormatJsonPath(path, "closing single-quote (') must immediately 
precede ']'");
         }
 
-        parts.add(new NestedPathField(getPathSubstring(path, partMark, i)));
+        parts.add(new NestedPathField(path.substring(partMark, i)));
+        dotMark = -1;
+        quoteMark = -1;
+        // chomp to next char to eat closing array
+        if (++i == path.length()) {
+          break;
+        }
+        partMark = i + 1;
+        arrayMark = -1;
+      }
+    }
+    // add the last element, this should never be an array because they close 
themselves
+    if (partMark < path.length()) {
+      if (quoteMark != -1) {
+        badFormatJsonPath(path, "unterminated single-quote (')");
+      }
+      if (arrayMark != -1) {
+        badFormatJsonPath(path, "unterminated '['");
+      }
+      parts.add(new NestedPathField(path.substring(partMark)));
+    }
+
+    return parts;
+  }
+
+  /**
+   * split a JSONPath path into a series of extractors to find things in 
stuff. This method is mostly a duplicate of
+   * {@link #parseJsonPath(String)} fixing up any bugs encountered from bad 
paths like '$..a' or '$.[0].a' from
+   * previously bugged versions of {@link #toNormalizedJsonPath(List)} and 
should be kept in sync.

Review Comment:
   Can these be made to share a common parse function that is parameterized 
with some boolean like `allowBadPaths`? If not, at least add a comment in 
`parseJsonPath` too, so someone modifying that can remember to update this one 
too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to