suneet-s commented on a change in pull request #10359:
URL: https://github.com/apache/druid/pull/10359#discussion_r492382977



##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/worker/shuffle/ShuffleMonitor.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.worker.shuffle;
+
+import com.google.inject.Inject;
+import 
org.apache.druid.indexing.worker.shuffle.ShuffleMetrics.PerDatasourceShuffleMetrics;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent.Builder;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+public class ShuffleMonitor extends AbstractMonitor
+{
+  private static final String SUPERVISOR_TASK_ID_DIMENSION = 
"supervisorTaskId";
+  private static final String SHUFFLE_BYTES_KEY = "shuffle/bytes";
+  private static final String SHUFFLE_REQUESTS_KEY = "shuffle/requests";

Review comment:
       other ingestion related metrics start with "ingest/" any thoughts on 
whether these metrics fall under the ingestion metrics category?
   
   I was thinking about where the metrics would live in the docs which is why I 
was asking this question. I thought maybe it belonged here 
https://druid.apache.org/docs/latest/operations/metrics.html#ingestion-metrics-realtime-process
 ?

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/worker/shuffle/ShuffleMetrics.java
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.worker.shuffle;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Shuffle metrcis for middleManagers and indexers. This class is thread-safe 
because shuffle can be performed by
+ * multiple HTTP threads while a monitoring thread periodically emits the 
snapshot of metrics.
+ *
+ * @see ShuffleResource
+ * @see org.apache.druid.java.util.metrics.MonitorScheduler
+ */
+public class ShuffleMetrics
+{
+  /**
+   * This lock is used to synchronize accesses to the reference to {@link 
#datasourceMetrics} and the
+   * {@link PerDatasourceShuffleMetrics} values of the map. This means,
+   *
+   * - Any updates on PerDatasourceShuffleMetrics in the map (and thus its key 
as well) should be synchronized
+   * under this lock.
+   * - Any updates on the reference to datasourceMetrics should be 
synchronized under this lock.
+   */
+  private final Object lock = new Object();
+
+  /**
+   * A map of (datasource name) -> {@link PerDatasourceShuffleMetrics}. This 
map is replaced with an empty map
+   * whenever a snapshot is taken since the map can keep growing over time 
otherwise. For concurrent access pattern,
+   * see {@link #shuffleRequested} and {@link #snapshotAndReset()}.
+   */
+  @GuardedBy("lock")
+  private Map<String, PerDatasourceShuffleMetrics> datasourceMetrics = new 
HashMap<>();
+
+  /**
+   * This method is called whenever a new shuffle is requested. Multiple tasks 
can request shuffle at the same time,
+   * while the monitoring thread takes a snapshot of the metrics. There is a 
happens-before relationship between
+   * shuffleRequested and {@link #snapshotAndReset()}.
+   */
+  public void shuffleRequested(String supervisorTaskId, long fileLength)
+  {
+    synchronized (lock) {

Review comment:
       Since there is a risk of the locking introducing a slow down here 
because of contention, can we update this to include a feature flag check?
   
   This way, if there are some unforeseen issues with locking, we can disable 
metric computation and reporting. I think a static feature flag - like a system 
property would be good enough for this use case.

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/worker/shuffle/ShuffleMetrics.java
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.worker.shuffle;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Shuffle metrcis for middleManagers and indexers. This class is thread-safe 
because shuffle can be performed by
+ * multiple HTTP threads while a monitoring thread periodically emits the 
snapshot of metrics.
+ *
+ * @see ShuffleResource
+ * @see org.apache.druid.java.util.metrics.MonitorScheduler
+ */
+public class ShuffleMetrics
+{
+  /**
+   * This lock is used to synchronize accesses to the reference to {@link 
#datasourceMetrics} and the
+   * {@link PerDatasourceShuffleMetrics} values of the map. This means,
+   *
+   * - Any updates on PerDatasourceShuffleMetrics in the map (and thus its key 
as well) should be synchronized
+   * under this lock.
+   * - Any updates on the reference to datasourceMetrics should be 
synchronized under this lock.
+   */
+  private final Object lock = new Object();
+
+  /**
+   * A map of (datasource name) -> {@link PerDatasourceShuffleMetrics}. This 
map is replaced with an empty map
+   * whenever a snapshot is taken since the map can keep growing over time 
otherwise. For concurrent access pattern,
+   * see {@link #shuffleRequested} and {@link #snapshotAndReset()}.
+   */
+  @GuardedBy("lock")

Review comment:
       Just curious - why did you choose to use the guarded by pattern instead 
of a ConcurrentMap?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to