gianm commented on code in PR #12646:
URL: https://github.com/apache/druid/pull/12646#discussion_r897204563


##########
.idea/misc.xml:
##########
@@ -84,7 +87,7 @@
     <resource url="http://maven.apache.org/ASSEMBLY/2.0.0"; 
location="$PROJECT_DIR$/.idea/xml-schemas/assembly-2.0.0.xsd" />
     <resource url="http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"; 
location="$PROJECT_DIR$/.idea/xml-schemas/svg11.dtd" />
   </component>
-  <component name="ProjectRootManager" version="2" languageLevel="JDK_1_8" 
default="false" project-jdk-name="1.8" project-jdk-type="JavaSDK">
+  <component name="ProjectRootManager" version="2" languageLevel="JDK_1_8" 
project-jdk-name="corretto-15" project-jdk-type="JavaSDK">

Review Comment:
   This change doesn't look intentional. When I check out the branch, it blows 
up my IDE because I don't have Corretto installed.



##########
core/src/main/java/org/apache/druid/collections/StupidPool.java:
##########
@@ -39,6 +39,18 @@
 public class StupidPool<T> implements NonBlockingPool<T>
 {
   private static final Logger log = new Logger(StupidPool.class);
+  private static final AtomicBoolean POISONED = new AtomicBoolean(false);

Review Comment:
   Could use some javadocs here explaining how poisoning works.



##########
processing/src/main/java/org/apache/druid/segment/QueryableIndexStorageAdapter.java:
##########
@@ -227,21 +227,15 @@ public VectorCursor makeVectorCursor(
     if (actualInterval == null) {
       return null;
     }
-
-    final ColumnSelectorColumnIndexSelector bitmapIndexSelector = 
makeBitmapIndexSelector(virtualColumns);
-
-    final FilterAnalysis filterAnalysis = analyzeFilter(filter, 
bitmapIndexSelector, queryMetrics);

Review Comment:
   I think this means the `analyzeFilter` method can be deleted from this file.



##########
processing/src/main/java/org/apache/druid/segment/ColumnSelector.java:
##########
@@ -29,6 +29,10 @@
  */
 public interface ColumnSelector extends ColumnInspector
 {
+  /**
+   * This method is apparently no longer used anymore, so deprecating it.
+   */
+  @Deprecated

Review Comment:
   This isn't a `@PublicApi` so why not delete it?



##########
processing/src/main/java/org/apache/druid/segment/QueryableIndexCursorSequenceBuilder.java:
##########
@@ -50,51 +56,66 @@
 
 import javax.annotation.Nullable;
 import java.io.IOException;
-import java.util.HashMap;
-import java.util.Map;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
 
 public class QueryableIndexCursorSequenceBuilder
 {
   private final QueryableIndex index;
   private final Interval interval;
   private final VirtualColumns virtualColumns;
-  @Nullable
-  private final ImmutableBitmap filterBitmap;
+  private final Filter filter;
+  private final QueryMetrics<? extends Query> metrics;
   private final long minDataTimestamp;
   private final long maxDataTimestamp;
   private final boolean descending;
-  @Nullable
-  private final Filter postFilter;
-  @Nullable
-  private final ColumnSelectorColumnIndexSelector bitmapIndexSelector;
 
   public QueryableIndexCursorSequenceBuilder(
       QueryableIndex index,
       Interval interval,
       VirtualColumns virtualColumns,
-      @Nullable ImmutableBitmap filterBitmap,
+      @Nullable Filter filter,
+      @Nullable QueryMetrics<? extends Query> metrics,
       long minDataTimestamp,
       long maxDataTimestamp,
-      boolean descending,
-      @Nullable Filter postFilter,
-      @Nullable ColumnSelectorColumnIndexSelector bitmapIndexSelector
+      boolean descending
   )
   {
     this.index = index;
     this.interval = interval;
     this.virtualColumns = virtualColumns;
-    this.filterBitmap = filterBitmap;
+    this.filter = filter;
+    this.metrics = metrics;
     this.minDataTimestamp = minDataTimestamp;
     this.maxDataTimestamp = maxDataTimestamp;
     this.descending = descending;
-    this.postFilter = postFilter;
-    this.bitmapIndexSelector = bitmapIndexSelector;
   }
 
   public Sequence<Cursor> build(final Granularity gran)
   {
+    final Closer closer = Closer.create();
+
+    // Column caches shared amongst all cursors in this sequence.
+    final ColumnCache columnCache = new ColumnCache(index, closer);
+
     final Offset baseOffset;
 
+    final ColumnSelectorColumnIndexSelector bitmapIndexSelector = new 
ColumnSelectorColumnIndexSelector(
+        index.getBitmapFactoryForDimensions(),
+        virtualColumns,
+        columnCache

Review Comment:
   There's a double-close that can happen as a result of the caching here: 
`ColumnSelectorColumnIndexSelector.getNumRows` gets the `__time` column and 
then immediately closes it. So calls to `bitmapIndexSelector.getNumRows()` 
subsequent to the first one will get an already-closed column. `not` filters 
call it once, so it'll happen on queries that have multiple `not` filters. And 
maybe other cases too?
   
   Seems like there's two ways we can fix it:
   
   1. Modify ColumnSelectorColumnIndexSelector to not close the time column, 
and instead assume it will get closed some other way. Modify places that 
construct ColumnSelectorColumnIndexSelector to ensure that this happens.
   
   2. Modify places that construct ColumnSelectorColumnIndexSelector to *not* 
pass in a ColumnCache, but to instead pass in something that doesn't cache 
columns. Keep the closing behavior of ColumnSelectorColumnIndexSelector the way 
it is.



##########
processing/src/main/java/org/apache/druid/segment/DeprecatedQueryableIndexColumnSelector.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.segment.column.ColumnHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+/**
+ * It likely looks weird that we are creating a new instance of ColumnSelector 
here that begins its life deprecated
+ * and only delegates methods to the Queryable Index. This is done 
intentionally so that the QueryableIndex doesn't
+ * accidentally get used as a ColumnSelector.
+ *
+ * The lifecycle of the QueryableIndex is over the lifetime of the segment on 
a specific process, while
+ * the ColumnSelector's lifecycle is for a given query.  When we don't use the 
same ColumnSelector for an
+ * entire query, we defeat caching and use a lot more resources than necessary 
for queries.
+ *
+ * Places that use this class are intentionally circumventing column caching 
and column lifecycle management,
+ * ostensibly because those code locations know that they are only looking at 
metadata.  If a code path uses this
+ * and actually accesses a column instead of just looking at metadata, it will 
leak any resources that said column
+ * requires.
+ *
+ * The ColumnCache is the preferred implementation of a ColumnSelector, it 
takes a Closer and that closer can be used
+ * to ensure that resources are cleaned up.
+ */
+@Deprecated
+public class DeprecatedQueryableIndexColumnSelector implements ColumnSelector

Review Comment:
   I'm wondering we can get away without having this class, especially after 
fixing the double-free issue with ColumnSelectorColumnIndexSelector. All but 
one of the usages of DeprecatedQueryableIndexColumnSelector are as an input to 
ColumnSelectorColumnIndexSelector. Once that's adjusted, it may be that we 
don't actually need this adapter anymore.
   
   The other usage is in a call to `Filter.supportsSelectivityEstimation` in 
AutoStrategy, where it would be fine (and preferable?) to pass in a ColumnCache 
instead of a DeprecatedQueryableIndexColumnSelector.



##########
pom.xml:
##########
@@ -1500,8 +1500,9 @@
                         <!-- set default options -->
                         <argLine>
                             @{jacocoArgLine}
+                            ${jdk.surefire.argLine}
                             -Xmx1500m
-                            -XX:MaxDirectMemorySize=512m
+                            -XX:MaxDirectMemorySize=2500m

Review Comment:
   Why?



##########
processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryEngine.java:
##########
@@ -161,7 +161,7 @@ private Sequence<Result<TimeseriesResultValue>> 
processVectorized(
       );
 
       if (granularizer == null) {
-        return Sequences.empty();
+        return Sequences.withBaggage(Sequences.empty(), closer);

Review Comment:
   Might as well close the closer immediately and then return 
`Sequences.empty()`? I guess either way is fine.



##########
processing/src/main/java/org/apache/druid/segment/DeprecatedQueryableIndexColumnSelector.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.segment.column.ColumnHolder;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+/**
+ * It likely looks weird that we are creating a new instance of ColumnSelector 
here that begins its life deprecated
+ * and only delegates methods to the Queryable Index. This is done 
intentionally so that the QueryableIndex doesn't
+ * accidentally get used as a ColumnSelector.
+ *
+ * The lifecycle of the QueryableIndex is over the lifetime of the segment on 
a specific process, while
+ * the ColumnSelector's lifecycle is for a given query.  When we don't use the 
same ColumnSelector for an
+ * entire query, we defeat caching and use a lot more resources than necessary 
for queries.
+ *
+ * Places that use this class are intentionally circumventing column caching 
and column lifecycle management,
+ * ostensibly because those code locations know that they are only looking at 
metadata.  If a code path uses this

Review Comment:
   It won't necessarily leak, will it? Callers can close the BaseColumn they 
get from the ColumnHolder. For example, 
`ColumnSelectorColumnIndexSelector.getNumRows` does this, which is called by 
AutoStrategy in a way that seems OK.



##########
processing/src/main/java/org/apache/druid/segment/ColumnCache.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.segment.column.BaseColumn;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.ColumnIndexSupplier;
+import org.apache.druid.segment.selector.settable.SettableColumnValueSelector;
+
+import javax.annotation.Nullable;
+import java.util.HashMap;
+import java.util.List;
+
+public class ColumnCache implements ColumnSelector
+{
+  private final HashMap<String, ColumnHolder> holderCache;
+  private final QueryableIndex index;
+  private final Closer closer;
+
+  public ColumnCache(QueryableIndex index, Closer closer)
+  {
+    this.index = index;
+    this.closer = closer;
+
+    this.holderCache = new HashMap<>();
+  }
+
+
+  @Override
+  public List<String> getColumnNames()
+  {
+    return index.getColumnNames();
+  }
+
+  @Nullable
+  @Override
+  public ColumnHolder getColumnHolder(String columnName)
+  {
+    return holderCache.computeIfAbsent(columnName, dimension -> {
+      // Here we do a funny little dance to memoize the BaseColumn and 
register it with the closer.
+      // It would probably be cleaner if the ColumnHolder itself was 
`Closeable` and did its own memoization,
+      // but that change is much wider and runs the risk of even more things 
that need to close the thing
+      // not actually closing it.  So, maybe this is a hack, maybe it's a wise 
decision, who knows, but at
+      // least for now, we grab the holder, grab the column, register the 
column with the closer and then return
+      // a new holder that always returns the same reference for the column.
+
+      final ColumnHolder holder = index.getColumnHolder(columnName);
+      if (holder == null) {
+        return null;
+      }
+
+      return new ColumnHolder()
+      {
+        @Nullable
+        private BaseColumn theColumn = null;
+
+        @Override
+        public ColumnCapabilities getCapabilities()
+        {
+          return holder.getCapabilities();
+        }
+
+        @Override
+        public int getLength()
+        {
+          return holder.getLength();
+        }
+
+        @Override
+        public BaseColumn getColumn()
+        {
+          if (theColumn == null) {
+            theColumn = closer.register(holder.getColumn());
+          }
+          return theColumn;
+        }
+
+        @Nullable
+        @Override
+        public ColumnIndexSupplier getIndexSupplier()
+        {
+          return holder.getIndexSupplier();

Review Comment:
   BaseColumns need to be closed but ColumnIndexSuppliers don't?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to