clintropolis commented on code in PR #19535:
URL: https://github.com/apache/druid/pull/19535#discussion_r3415453574


##########
processing/src/main/java/org/apache/druid/segment/PartialQueryableIndexCursorFactory.java:
##########
@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import com.google.common.util.concurrent.FutureCallback;
+import com.google.common.util.concurrent.Futures;
+import com.google.common.util.concurrent.ListenableFuture;
+import com.google.common.util.concurrent.ListeningExecutorService;
+import com.google.common.util.concurrent.MoreExecutors;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.Order;
+import org.apache.druid.query.OrderBy;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.projections.Projections;
+import org.apache.druid.segment.projections.QueryableProjection;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.utils.CloseableUtils;
+
+import javax.annotation.Nullable;
+import java.io.Closeable;
+import java.util.ArrayList;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Partial-aware {@link CursorFactory} for {@link PartialQueryableIndex}.
+ * <p>
+ * <b>Sync vs async contract.</b> {@link #makeCursorHolder} requires the 
segment to already be fully downloaded,
+ * intended for callers that acquired the segment via the eager 
(download-everything-up-front) path, so by the time
+ * they ask for a cursor every internal file is on disk. If anything is 
missing it throws
+ * {@link DruidException#defensive} so that we never trigger downloads on the 
sync path, since processing threads must
+ * not block on deep-storage I/O. {@link #makeCursorHolderAsync} is the only 
path that performs downloads on demand;
+ * callers acknowledge that by opting into the async variant when they acquire 
a partial segment.
+ * <p>
+ * <b>Async download granularity.</b> Pre-fetch is column-level. {@link 
#makeCursorHolderAsync} calls
+ * {@link QueryableIndex#getColumnHolder} on each required column; the 
memoized supplier on the underlying
+ * {@link PartialQueryableIndex} eagerly invokes
+ * {@link org.apache.druid.segment.file.PartialSegmentFileMapperV10#mapFile} 
inside that call, which is what triggers
+ * the deep-storage range read. The cursor holder constructed afterward sees 
the already-materialized holders via the
+ * same memoized suppliers, so no further downloads happen at cursor-read time.
+ * <p>
+ * If a projection matches, the required columns are looked up against the 
projection's row selector and its rewritten
+ * {@link CursorBuildSpec} (which carries physical columns in the projection's 
namespace). When
+ * {@link CursorBuildSpec#getPhysicalColumns()} is {@code null}, every column 
on the chosen row selector is pre-fetched
+ * as required by the contract of {@link CursorBuildSpec}.
+ * <p>
+ * <b>Parallelism.</b> Each column's materialization is submitted as a 
separate task to the supplied download executor.
+ * The cursor holder is constructed once every column task has completed.
+ */
+public class PartialQueryableIndexCursorFactory implements CursorFactory
+{
+  private final PartialQueryableIndex index;
+  private final QueryableIndexCursorFactory delegate;
+  private final PartialBundleAcquirer bundleAcquirer;
+
+  public PartialQueryableIndexCursorFactory(
+      PartialQueryableIndex index,
+      TimeBoundaryInspector timeBoundaryInspector,
+      PartialBundleAcquirer bundleAcquirer
+  )
+  {
+    this.index = index;
+    this.delegate = new QueryableIndexCursorFactory(index, 
timeBoundaryInspector);
+    this.bundleAcquirer = bundleAcquirer;
+  }
+
+  @Override
+  public CursorHolder makeCursorHolder(CursorBuildSpec spec)

Review Comment:
   I have left this as is for now because this path gets actually used when a 
partial segment is used in 'full load' mode (e.g. by the native engine instead 
of Dart/MSQ). These guards are mainly to ensure that the caller is using the 
correct `AcquireMode` to hold the segment during processing.
   
   The default implementation is probably not very useful though and should 
maybe be an exception, I'll look into changing this (either in this PR or in a 
follow-up since I still need to wire up unnest and join cursor factories to 
have async implementations so could do it then too).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to