clintropolis commented on code in PR #18892:
URL: https://github.com/apache/druid/pull/18892#discussion_r2684533007
##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -835,6 +836,14 @@ private DataSegmentWithMetadata mergeAndPush(
metrics.setMergeTime(mergeTimeMillis);
log.debug("Segment[%s] built in %,dms.", identifier, mergeTimeMillis);
+ QueryableIndex index = indexIO.loadIndex(mergedFile);
+ closer.register(index);
Review Comment:
can this just use `sink.getNumRows()` or is that not cool for some reason?
##########
processing/src/main/java/org/apache/druid/query/metadata/SegmentMetadataQueryRunnerFactory.java:
##########
@@ -67,6 +68,59 @@ public SegmentMetadataQueryRunnerFactory(
this.queryWatcher = queryWatcher;
}
+ public QueryRunner<SegmentAnalysis> createRunner(SegmentReference segmentRef)
+ {
+ if (segmentRef.getDataSegment() == null) {
+ throw DruidException.defensive("Missing Segment[%s]",
segmentRef.getSegmentDescriptor());
+ } else if (segmentRef.getSegmentReference().isEmpty()) {
+ throw DruidException.defensive("Missing Segment[%s]",
segmentRef.getDataSegment().getId());
+ }
+
+ Segment segment = segmentRef.getSegmentReference().get();
+ if (!Objects.equals(segmentRef.getDataSegment().getId(), segment.getId()))
{
+ throw DruidException.defensive(
+ "SegmentId mismatch in DataSegment[%s] and Segment[%s]",
+ segmentRef.getDataSegment().getId(),
+ segment.getId()
+ );
+ }
+
+ return new QueryRunner<>()
+ {
+ @Override
+ public Sequence<SegmentAnalysis> run(QueryPlus<SegmentAnalysis> inQ,
ResponseContext responseContext)
+ {
+ SegmentMetadataQuery updatedQuery = ((SegmentMetadataQuery)
inQ.getQuery()).withFinalizedAnalysisTypes(toolChest.getConfig());
+ final SegmentAnalyzer analyzer = new
SegmentAnalyzer(updatedQuery.getAnalysisTypes());
+
+ Integer numRows = segmentRef.getDataSegment().getTotalRows();
+ if (numRows == null) {
+ numRows = analyzer.numRows(segment);
+ }
+ long totalSize = analyzer.analyzingSize() ?
segmentRef.getDataSegment().getSize() : 0L;
Review Comment:
this is is a breaking change to size analysis. While the current size
analysis is arguably nonsense, I don't think we want to do an incompatible
change like this, see discussion in
https://github.com/apache/druid/issues/7124, and so this change should be
reverted.
I'm also not terribly sure the total size isn't particularly useful, I think
the size analysis we want to replace this with would be a breakdown of column
parts since that is actionable (e.g. can decide to not have indexes or whatever
else). In other words, I don't think `DataSegment` is particularly useful to
use here since we already have the `Segment`, so it doesn't save us anything,
and costs us dragging around the overhead of an extra memory reference linking
back to the `DataSegment` for anything a query engine processes
##########
processing/src/main/java/org/apache/druid/segment/SegmentReference.java:
##########
@@ -49,23 +51,51 @@ public static SegmentReference missing(SegmentDescriptor
segmentDescriptor)
return new SegmentReference(segmentDescriptor, Optional.empty(), null);
}
+ @Nullable
+ private final DataSegment dataSegment;
private final SegmentDescriptor segmentDescriptor;
private final Optional<Segment> segmentReference;
@Nullable
private final Closeable cleanupHold;
private final AtomicBoolean closed = new AtomicBoolean(false);
+ public SegmentReference(
+ DataSegmentAndDescriptor segment,
+ Optional<Segment> segmentReference,
+ @Nullable Closeable cleanupHold
+ )
+ {
+ this(segment.getDataSegment(), segment.getDescriptor(), segmentReference,
cleanupHold);
+ }
+
public SegmentReference(
SegmentDescriptor segmentDescriptor,
Optional<Segment> segmentReference,
@Nullable Closeable cleanupHold
)
{
+ this(null, segmentDescriptor, segmentReference, cleanupHold);
+ }
+
+ private SegmentReference(
+ @Nullable DataSegment segment,
Review Comment:
this change feels controversial to me, I don't love the idea of dragging
around a reference to `DataSegment` everywhere we are using segments, what is
this for. just for the segment metadata query changes?
##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/StreamAppenderator.java:
##########
@@ -970,6 +971,14 @@ private DataSegmentWithMetadata mergeAndPush(
metrics.setMergeTime(mergeTimeMillis);
log.debug("Segment[%s] built in %,dms.", identifier, mergeTimeMillis);
+ QueryableIndex index = indexIO.loadIndex(mergedFile);
+ closer.register(index);
Review Comment:
same comment about `sink.getNumRows()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]