mayankshriv commented on a change in pull request #6165:
URL: https://github.com/apache/incubator-pinot/pull/6165#discussion_r509705605



##########
File path: 
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
##########
@@ -1641,23 +1643,30 @@ public void addNewSegment(String tableName, 
SegmentMetadata segmentMetadata, Str
       Map<InstancePartitionsType, InstancePartitions> instancePartitionsMap = 
Collections
           .singletonMap(InstancePartitionsType.OFFLINE, InstancePartitionsUtils
               .fetchOrComputeInstancePartitions(_helixZkManager, 
offlineTableConfig, InstancePartitionsType.OFFLINE));
-      HelixHelper.updateIdealState(_helixZkManager, offlineTableName, 
idealState -> {
-        assert idealState != null;
-        Map<String, Map<String, String>> currentAssignment = 
idealState.getRecord().getMapFields();
-        if (currentAssignment.containsKey(segmentName)) {
-          LOGGER.warn("Segment: {} already exists in the IdealState for table: 
{}, do not update", segmentName,
-              offlineTableName);
-        } else {
-          List<String> assignedInstances =
-              segmentAssignment.assignSegment(segmentName, currentAssignment, 
instancePartitionsMap);
-          LOGGER.info("Assigning segment: {} to instances: {} for table: {}", 
segmentName, assignedInstances,
-              offlineTableName);
-          currentAssignment.put(segmentName,
-              SegmentAssignmentUtils.getInstanceStateMap(assignedInstances, 
SegmentStateModel.ONLINE));
+      if (_tableUpdaterLockMap.get(offlineTableName) == null) {
+        synchronized (_tableUpdaterLockMap) {
+          if (_tableUpdaterLockMap.get(offlineTableName) == null) {
+            _tableUpdaterLockMap.put(offlineTableName, new Object());

Review comment:
       Discussing offline, I gather this:
   
   - This helps in case of thousands of segments being pushed. For metadata 
push, we still need to generate the metadata from segment tar file, which takes 
time. In order to improve that, if we increase parallelism, then metadata 
generation improves. However, parallel push in segments start taking longer due 
to retries. If we reduce parallelism, the metadata generation slows down as 
well.
   
   - With this PR, we can still keep high parallelism for metadata generation, 
but also reduce longer update time (or even failures), by reducing the 
probability of retries.
   
   - The Map size should be small, and the synchronization is only for IS 
update. So this can be thought of as a bug fix, which is reducing number of 
retries, without penalizing on push time with parallelism.
   
   We also discussed the following:
   - We can also reduce number of IS updates (and hence retries) by having bulk 
updates to IS. For example, metadata push can pack multiple segment metadatas 
in one POST call. 
   
   @siddharthteotia please add anything I missed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to