Re: [PR] Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (druid)

via GitHub Wed, 25 Oct 2023 20:24:38 -0700


clintropolis commented on code in PR #14985:
URL: https://github.com/apache/druid/pull/14985#discussion_r1372515182



##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/BrokerSegmentMetadataCacheConfig.java:
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.schema;
+
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.segment.metadata.SegmentMetadataCacheConfig;
+import org.joda.time.Period;
+
+/**
+ * Broker-side configuration class for managing segment polling from the 
Coordinator and
+ * customizing properties related to the SegmentMetadata cache which is used 
to infer datasources for SQL.
+ *
+ * <p>See {@link BrokerSegmentMetadataCache}, {@link MetadataSegmentView}.</p>
+ *
+ * <p>This class shares the same config root as {@link 
org.apache.druid.sql.calcite.planner.PlannerConfig}
+ * to maintain backward compatibility for when the properties here resided in 
{@code PlannerConfig}.</p>
+ *
+ * <p> The property {@link #awaitInitializationOnStart} is overridden in this 
class with a default value
+ * of {@code true}, which differs from the parent class. This ensures that the 
SegmentMetadata cache is
+ * fully initialized before other startup processes proceed.</p>
+ */
+public class BrokerSegmentMetadataCacheConfig extends 
SegmentMetadataCacheConfig
+{
+  // A flag indicating whether to cache polled segments from the Coordinator.
+  @JsonProperty
+  private boolean metadataSegmentCacheEnable = false;
+
+  // Interval for polling segments from the coordinator.
+  @JsonProperty
+  private long metadataSegmentPollPeriod = 60000;
+
+  // A flag indicating whether to wait for cache initialization during startup.
+  @JsonProperty
+  private boolean awaitInitializationOnStart = true;
+
+  @JsonProperty
+  private boolean disableSegmentMetadataQueries = false;

Review Comment:
   afaict, this is not documented, should it be?



##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java:
##########
@@ -302,13 +303,30 @@ public Enumerable<Object[]> scan(DataContext root)
             final DataSegment segment = val.getDataSegment();
             segmentsAlreadySeen.add(segment.getId());
             final PartialSegmentData partialSegmentData = 
partialSegmentDataMap.get(segment.getId());
-            long numReplicas = 0L, numRows = 0L, isRealtime = 0L, isAvailable 
= 0L;
+            long numReplicas = 0L, numRows = 0L, isRealtime, isAvailable = 0L;
+
             if (partialSegmentData != null) {
               numReplicas = partialSegmentData.getNumReplicas();
-              numRows = partialSegmentData.getNumRows();
               isAvailable = partialSegmentData.isAvailable();
-              isRealtime = partialSegmentData.isRealtime();
+              numRows = partialSegmentData.getNumRows();
+            }
+
+            // If table schema building is enabled on the Coordinator, 
SegmentMetadataCache on the
+            // broker might have outdated or no information regarding numRows 
and rowSignature for a segment.
+            // We should use {@code numRows} from the segment polled from the 
coordinator.
+            if (null != val.getNumRows()) {
+              numRows = val.getNumRows();
             }
+
+            isRealtime = Boolean.TRUE.equals(val.isRealtime()) ? 1 : 0;
+
+            // set of segments returned from Coordinator include published and 
realtime segments
+            // so realtime segments are not published and vice versa
+            boolean isPublished = !val.isRealtime();

Review Comment:
   nit: why does `isRealtime` do this `Boolean.TRUE.equals` thing for 
`val.isRealtime()`, while `isPublished` use `!val.isRealtime()`, and similar 
`isActive` uses it directly



##########
server/src/main/java/org/apache/druid/client/CoordinatorServerView.java:
##########
@@ -162,7 +180,28 @@ private void serverAddedSegment(final DruidServerMetadata 
server, final DataSegm
         );
         segmentLoadInfos.put(segmentId, segmentLoadInfo);
       }
+
+      if (null != druidClientFactory) {
+        QueryRunner queryRunner = serverQueryRunners.get(server.getName());
+        if (null == queryRunner) {
+          DruidServer inventoryValue = 
baseView.getInventoryValue(server.getName());
+          if (inventoryValue == null) {

Review Comment:
   nit: can you please either put `null` consistently on the left or right of 
these checks? My preference would be on the right since my gut says that is 
what the majority of the codebase is doing, but at least we should be 
consistent in the same function...



##########
server/src/test/java/org/apache/druid/segment/metadata/SegmentMetadataQuerySegmentWalkerTest.java:
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.metadata;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import org.apache.druid.client.CachingClusteredClientTest.ServerExpectation;
+import org.apache.druid.client.CachingClusteredClientTest.ServerExpectations;
+import org.apache.druid.client.CoordinatorSegmentWatcherConfig;
+import org.apache.druid.client.CoordinatorServerView;
+import org.apache.druid.client.DirectDruidClientFactory;
+import org.apache.druid.client.DruidServer;
+import org.apache.druid.client.SegmentLoadInfo;
+import org.apache.druid.client.ServerInventoryView;
+import org.apache.druid.guice.http.DruidHttpClientConfig;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.guava.Yielder;
+import org.apache.druid.java.util.common.guava.Yielders;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.query.MapQueryToolChestWarehouse;
+import org.apache.druid.query.Query;
+import org.apache.druid.query.QueryContexts;
+import org.apache.druid.query.QueryPlus;
+import org.apache.druid.query.QueryRunner;
+import org.apache.druid.query.QueryToolChest;
+import org.apache.druid.query.QueryToolChestWarehouse;
+import org.apache.druid.query.SegmentDescriptor;
+import org.apache.druid.query.TableDataSource;
+import org.apache.druid.query.context.ResponseContext;
+import org.apache.druid.query.metadata.SegmentMetadataQueryConfig;
+import org.apache.druid.query.metadata.SegmentMetadataQueryQueryToolChest;
+import org.apache.druid.query.metadata.metadata.AllColumnIncluderator;
+import org.apache.druid.query.metadata.metadata.SegmentAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentMetadataQuery;
+import org.apache.druid.query.spec.MultipleSpecificSegmentSpec;
+import org.apache.druid.server.coordination.ServerType;
+import org.apache.druid.server.initialization.ServerConfig;
+import org.apache.druid.server.metrics.NoopServiceEmitter;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.SegmentId;
+import org.apache.druid.timeline.VersionedIntervalTimeline;
+import org.apache.druid.timeline.partition.ShardSpec;
+import org.apache.druid.timeline.partition.SingleDimensionShardSpec;
+import org.easymock.EasyMock;
+import org.easymock.IAnswer;
+import org.joda.time.Interval;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.Mockito;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class SegmentMetadataQuerySegmentWalkerTest
+{
+  private final String DATASOURCE = "testDatasource";
+  private QueryToolChestWarehouse warehouse;
+  private DruidHttpClientConfig httpClientConfig;
+  private DruidServer[] servers;
+  private Random random;
+
+  @Before
+  public void setUp()
+  {
+    warehouse = new MapQueryToolChestWarehouse(
+        ImmutableMap.<Class<? extends Query>, QueryToolChest>builder()
+                    .put(
+                        SegmentMetadataQuery.class,
+                        new SegmentMetadataQueryQueryToolChest(
+                            new SegmentMetadataQueryConfig("P1W")
+
+                        )
+                    ).build());
+
+    httpClientConfig = new DruidHttpClientConfig()
+    {
+      @Override
+      public long getMaxQueuedBytes()
+      {
+        return 0L;
+      }
+    };
+
+    servers =
+        new DruidServer[]{
+            new DruidServer("test1", "test1", null, 10, ServerType.HISTORICAL, 
"bye", 0),
+            new DruidServer("test2", "test2", null, 10, ServerType.HISTORICAL, 
"bye", 0),
+            new DruidServer("test3", "test2", null, 10, ServerType.REALTIME, 
"bye", 0)

Review Comment:
   I know this is just a test, but afaik nothing uses `REALTIME` anymore, so 
probably should use `INDEXER_EXECUTOR`...
   
   `REALTIME` refers to the old realtime node type that was removed quite a 
while ago https://github.com/apache/druid/pull/7915
   
   We should probably remove this `ServerType` completely now since it seems 
unlikely anyone will be rolling upgrade from such an old version at this point 
😅 



##########
server/src/test/java/org/apache/druid/segment/metadata/SegmentMetadataQuerySegmentWalkerTest.java:
##########
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.metadata;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import org.apache.druid.client.CachingClusteredClientTest.ServerExpectation;
+import org.apache.druid.client.CachingClusteredClientTest.ServerExpectations;
+import org.apache.druid.client.CoordinatorSegmentWatcherConfig;
+import org.apache.druid.client.CoordinatorServerView;
+import org.apache.druid.client.DirectDruidClientFactory;
+import org.apache.druid.client.DruidServer;
+import org.apache.druid.client.SegmentLoadInfo;
+import org.apache.druid.client.ServerInventoryView;
+import org.apache.druid.guice.http.DruidHttpClientConfig;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Sequences;
+import org.apache.druid.java.util.common.guava.Yielder;
+import org.apache.druid.java.util.common.guava.Yielders;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.query.MapQueryToolChestWarehouse;
+import org.apache.druid.query.Query;
+import org.apache.druid.query.QueryContexts;
+import org.apache.druid.query.QueryPlus;
+import org.apache.druid.query.QueryRunner;
+import org.apache.druid.query.QueryToolChest;
+import org.apache.druid.query.QueryToolChestWarehouse;
+import org.apache.druid.query.SegmentDescriptor;
+import org.apache.druid.query.TableDataSource;
+import org.apache.druid.query.context.ResponseContext;
+import org.apache.druid.query.metadata.SegmentMetadataQueryConfig;
+import org.apache.druid.query.metadata.SegmentMetadataQueryQueryToolChest;
+import org.apache.druid.query.metadata.metadata.AllColumnIncluderator;
+import org.apache.druid.query.metadata.metadata.SegmentAnalysis;
+import org.apache.druid.query.metadata.metadata.SegmentMetadataQuery;
+import org.apache.druid.query.spec.MultipleSpecificSegmentSpec;
+import org.apache.druid.server.coordination.ServerType;
+import org.apache.druid.server.initialization.ServerConfig;
+import org.apache.druid.server.metrics.NoopServiceEmitter;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.SegmentId;
+import org.apache.druid.timeline.VersionedIntervalTimeline;
+import org.apache.druid.timeline.partition.ShardSpec;
+import org.apache.druid.timeline.partition.SingleDimensionShardSpec;
+import org.easymock.EasyMock;
+import org.easymock.IAnswer;
+import org.joda.time.Interval;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.Mockito;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class SegmentMetadataQuerySegmentWalkerTest
+{
+  private final String DATASOURCE = "testDatasource";
+  private QueryToolChestWarehouse warehouse;
+  private DruidHttpClientConfig httpClientConfig;
+  private DruidServer[] servers;
+  private Random random;
+
+  @Before
+  public void setUp()
+  {
+    warehouse = new MapQueryToolChestWarehouse(
+        ImmutableMap.<Class<? extends Query>, QueryToolChest>builder()
+                    .put(
+                        SegmentMetadataQuery.class,
+                        new SegmentMetadataQueryQueryToolChest(
+                            new SegmentMetadataQueryConfig("P1W")
+
+                        )
+                    ).build());
+
+    httpClientConfig = new DruidHttpClientConfig()
+    {
+      @Override
+      public long getMaxQueuedBytes()
+      {
+        return 0L;
+      }
+    };
+
+    servers =
+        new DruidServer[]{
+            new DruidServer("test1", "test1", null, 10, ServerType.HISTORICAL, 
"bye", 0),
+            new DruidServer("test2", "test2", null, 10, ServerType.HISTORICAL, 
"bye", 0),
+            new DruidServer("test3", "test2", null, 10, ServerType.REALTIME, 
"bye", 0)

Review Comment:
   I know this is just a test, but afaik nothing uses `REALTIME` anymore, so 
probably should use `INDEXER_EXECUTOR`...
   
   `REALTIME` refers to the old realtime node type that was removed quite a 
while ago https://github.com/apache/druid/pull/7915
   
   We should probably remove this `ServerType` completely now since it seems 
unlikely anyone will be rolling upgrade from such an old version at this point 
😅 



##########
server/src/main/java/org/apache/druid/client/CachingClusteredClient.java:
##########
@@ -844,23 +845,26 @@ private byte[] computeQueryCacheKeyWithJoin()
     }
   }
 
-  private static class TimelineConverter implements 
UnaryOperator<TimelineLookup<String, ServerSelector>>
+  /**
+   * Helper class to build a new timeline of filtered segments.
+   */
+  public static class TimelineConverter<ObjectType extends 
Overshadowable<ObjectType>> implements UnaryOperator<TimelineLookup<String, 
ObjectType>>

Review Comment:
   nit: is there really nothing more descriptive than `ObjectType`? Might as 
well just use `T` if not 😅  



##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/BrokerSegmentMetadataCache.java:
##########
@@ -0,0 +1,257 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.schema;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Sets;
+import com.google.inject.Inject;
+import org.apache.druid.client.InternalQueryConfig;
+import org.apache.druid.client.ServerView;
+import org.apache.druid.client.TimelineServerView;
+import org.apache.druid.client.coordinator.CoordinatorClient;
+import org.apache.druid.common.guava.FutureUtils;
+import org.apache.druid.guice.ManageLifecycle;
+import org.apache.druid.java.util.emitter.EmittingLogger;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.metadata.AbstractSegmentMetadataCache;
+import org.apache.druid.segment.metadata.DataSourceInformation;
+import org.apache.druid.server.QueryLifecycleFactory;
+import org.apache.druid.server.coordination.DruidServerMetadata;
+import org.apache.druid.server.security.Escalator;
+import 
org.apache.druid.sql.calcite.table.DatasourceTable.PhysicalDatasourceMetadata;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.SegmentId;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Broker-side cache of segment metadata that combines segments to build
+ * datasources which become "tables" in Calcite. This cache provides the 
"physical"
+ * metadata about a dataSource which is blended with catalog "logical" metadata
+ * to provide the final user-view of each dataSource.
+ * <p>
+ * This class extends {@link AbstractSegmentMetadataCache} and introduces 
following changes,
+ * <ul>
+ *   <li>The refresh mechanism includes polling the coordinator for datasource 
schema,
+ *       and falling back to running {@link 
org.apache.druid.query.metadata.metadata.SegmentMetadataQuery}.</li>
+ *   <li>It builds and caches {@link PhysicalDatasourceMetadata} object for 
the table schema</li>
+ * </ul>
+ */
+@ManageLifecycle
+public class BrokerSegmentMetadataCache extends 
AbstractSegmentMetadataCache<PhysicalDatasourceMetadata>
+{
+  private static final EmittingLogger log = new 
EmittingLogger(BrokerSegmentMetadataCache.class);
+
+  private final PhysicalDatasourceMetadataFactory dataSourceMetadataFactory;
+  private final CoordinatorClient coordinatorClient;
+
+  private final BrokerSegmentMetadataCacheConfig config;
+
+  @Inject
+  public BrokerSegmentMetadataCache(
+      final QueryLifecycleFactory queryLifecycleFactory,
+      final TimelineServerView serverView,
+      final BrokerSegmentMetadataCacheConfig config,
+      final Escalator escalator,
+      final InternalQueryConfig internalQueryConfig,
+      final ServiceEmitter emitter,
+      final PhysicalDatasourceMetadataFactory dataSourceMetadataFactory,
+      final CoordinatorClient coordinatorClient
+  )
+  {
+    super(
+        queryLifecycleFactory,
+        config,
+        escalator,
+        internalQueryConfig,
+        emitter
+    );
+    this.dataSourceMetadataFactory = dataSourceMetadataFactory;
+    this.coordinatorClient = coordinatorClient;
+    this.config = config;
+    initServerViewTimelineCallback(serverView);
+  }
+
+  private void initServerViewTimelineCallback(final TimelineServerView 
serverView)
+  {
+    serverView.registerTimelineCallback(
+        callbackExec,
+        new TimelineServerView.TimelineCallback()
+        {
+          @Override
+          public ServerView.CallbackAction timelineInitialized()
+          {
+            synchronized (lock) {
+              isServerViewInitialized = true;
+              lock.notifyAll();
+            }
+
+            return ServerView.CallbackAction.CONTINUE;
+          }
+
+          @Override
+          public ServerView.CallbackAction segmentAdded(final 
DruidServerMetadata server, final DataSegment segment)
+          {
+            addSegment(server, segment);
+            return ServerView.CallbackAction.CONTINUE;
+          }
+
+          @Override
+          public ServerView.CallbackAction segmentRemoved(final DataSegment 
segment)
+          {
+            removeSegment(segment);
+            return ServerView.CallbackAction.CONTINUE;
+          }
+
+          @Override
+          public ServerView.CallbackAction serverSegmentRemoved(
+              final DruidServerMetadata server,
+              final DataSegment segment
+          )
+          {
+            removeServerSegment(server, segment);
+            return ServerView.CallbackAction.CONTINUE;
+          }
+        }
+    );
+  }
+
+  /**
+   * Refreshes the set of segments in two steps:
+   * <ul>
+   *  <li>Polls the coordinator for the datasource schema.</li>
+   *  <li>Refreshes the remaining set of segments by executing a 
SegmentMetadataQuery and
+   *      builds datasource schema by combining segment schema.</li>
+   * </ul>
+   *
+   * @param segmentsToRefresh    segments for which the schema might have 
changed
+   * @param dataSourcesToRebuild datasources for which the schema might have 
changed
+   * @throws IOException         when querying segment schema from data nodes 
and tasks
+   */
+  @Override
+  public void refresh(final Set<SegmentId> segmentsToRefresh, final 
Set<String> dataSourcesToRebuild) throws IOException
+  {
+    // query schema for all datasources, which includes,
+    // datasources explicitly marked for rebuilding
+    // datasources for the segments to be refreshed
+    // prebuilt datasources
+    final Set<String> dataSourcesToQuery = new HashSet<>(dataSourcesToRebuild);
+
+    segmentsToRefresh.forEach(segment -> 
dataSourcesToQuery.add(segment.getDataSource()));
+
+    dataSourcesToQuery.addAll(tables.keySet());
+
+    // Fetch datasource information from the Coordinator
+    Map<String, PhysicalDatasourceMetadata> polledDataSourceMetadata = 
queryDataSourceInformation(dataSourcesToQuery);

Review Comment:
   since this is not on by default in the coordinator, shouldn't we have a 
broker side config to indicate if we query the coordinator at all? Like it 
seems like there is no point in doing this if the coordinator cache is not 
running (the default behavior)



##########
server/src/test/java/org/apache/druid/segment/metadata/SegmentMetadataCacheCommon.java:
##########
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.metadata;
+
+import com.google.common.base.Suppliers;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.data.input.InputRowSchema;
+import org.apache.druid.data.input.impl.DimensionsSpec;
+import org.apache.druid.data.input.impl.MapInputRowParser;
+import org.apache.druid.data.input.impl.TimestampSpec;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.query.DefaultGenericQueryMetricsFactory;
+import org.apache.druid.query.DefaultQueryConfig;
+import org.apache.druid.query.Query;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.query.QuerySegmentWalker;
+import org.apache.druid.query.QueryToolChest;
+import org.apache.druid.query.QueryToolChestWarehouse;
+import org.apache.druid.query.aggregation.CountAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregatorFactory;
+import org.apache.druid.query.aggregation.LongSumAggregatorFactory;
+import 
org.apache.druid.query.aggregation.hyperloglog.HyperUniquesAggregatorFactory;
+import org.apache.druid.segment.IndexBuilder;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.incremental.IncrementalIndexSchema;
+import 
org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
+import org.apache.druid.server.QueryLifecycleFactory;
+import org.apache.druid.server.QueryStackTests;
+import org.apache.druid.server.log.TestRequestLogger;
+import org.apache.druid.server.metrics.NoopServiceEmitter;
+import org.apache.druid.server.security.AuthConfig;
+import org.apache.druid.server.security.AuthTestUtils;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.apache.druid.timeline.partition.NumberedShardSpec;
+import org.junit.Rule;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.List;
+import java.util.Map;
+
+public abstract class SegmentMetadataCacheCommon
+{
+  public static final String DATASOURCE1 = "foo";
+  public static final String DATASOURCE2 = "foo2";
+  public static final String DATASOURCE3 = "foo3";
+  public static final String SOME_DATASOURCE = "some_datasource";
+  public static final String TIMESTAMP_COLUMN = "t";
+  private static final InputRowSchema FOO_SCHEMA = new InputRowSchema(
+      new TimestampSpec(TIMESTAMP_COLUMN, "iso", null),
+      new DimensionsSpec(
+          DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", 
"dim3"))
+      ),
+      null
+  );
+
+  public final List<InputRow> ROWS1 = ImmutableList.of(
+      createRow(ImmutableMap.of("t", "2000-01-01", "m1", "1.0", "dim1", "")),
+      createRow(ImmutableMap.of("t", "2000-01-02", "m1", "2.0", "dim1", 
"10.1")),
+      createRow(ImmutableMap.of("t", "2000-01-03", "m1", "3.0", "dim1", "2"))
+  );
+
+  public final List<InputRow> ROWS2 = ImmutableList.of(
+      createRow(ImmutableMap.of("t", "2001-01-01", "m1", "4.0", "dim2", 
ImmutableList.of("a"))),
+      createRow(ImmutableMap.of("t", "2001-01-02", "m1", "5.0", "dim2", 
ImmutableList.of("abc"))),
+      createRow(ImmutableMap.of("t", "2001-01-03", "m1", "6.0"))
+  );
+
+  public QueryRunnerFactoryConglomerate conglomerate;
+  public Closer resourceCloser;
+  public QueryToolChestWarehouse queryToolChestWarehouse;
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  public QueryableIndex index1;
+  public QueryableIndex index2;
+
+  public QueryableIndex indexAuto1;
+  public QueryableIndex indexAuto2;
+
+  public DataSegment realtimeSegment1;
+  public DataSegment segment1;
+  public DataSegment segment2;
+  public DataSegment segment3;
+  public DataSegment segment4;
+  public DataSegment segment5;
+
+  static {
+    NullHandling.initializeForTests();
+  }

Review Comment:
   nit: recommend extending `InitializedNullHandlingTest` since it also 
initializes `ExpressionProcessing` (the other static config), though since this 
probably isn't running any actual queries it probably isn't necessary..



##########
server/src/main/java/org/apache/druid/client/CachingClusteredClient.java:
##########
@@ -844,23 +845,26 @@ private byte[] computeQueryCacheKeyWithJoin()
     }
   }
 
-  private static class TimelineConverter implements 
UnaryOperator<TimelineLookup<String, ServerSelector>>
+  /**
+   * Helper class to build a new timeline of filtered segments.
+   */
+  public static class TimelineConverter<ObjectType extends 
Overshadowable<ObjectType>> implements UnaryOperator<TimelineLookup<String, 
ObjectType>>

Review Comment:
   nit: is there really nothing more descriptive than `ObjectType`? Might as 
well just use `T` if not 😅  



##########
server/src/main/java/org/apache/druid/client/SegmentLoadInfo.java:
##########
@@ -59,6 +61,15 @@ public ImmutableSegmentLoadInfo toImmutableSegmentLoadInfo()
     return new ImmutableSegmentLoadInfo(segment, servers);
   }
 
+  /**
+   * Randomly return one server from the sets of {@code servers}
+   */
+  public DruidServerMetadata pickOne()
+  {
+    synchronized (this) {

Review Comment:
   if all of the methods are synchronized, then we don't need a concurrent set 
backing the servers list, suggest either dropping the concurrent backing set or 
making random pick not require locks like this, i don't have a strong opinion 
either way



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (druid)

Reply via email to