Re: [PR] (perf) server: batch SQL Metadata deleteSegments and updateSegmentMetadata (druid)

via GitHub Fri, 21 Jul 2023 20:07:37 -0700


kfaraz commented on code in PR #14639:
URL: https://github.com/apache/druid/pull/14639#discussion_r1271234076



##########
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java:
##########
@@ -1780,11 +1795,16 @@ public void deleteSegments(final Set<DataSegment> 
segments)
           public Void inTransaction(Handle handle, TransactionStatus 
transactionStatus)
           {
             int segmentSize = segments.size();
-            String dataSource = "";
+            String dataSource = 
segments.stream().findAny().map(DataSegment::getDataSource).orElse("?");
+
+            final String deleteFrom = StringUtils.format("DELETE from %s WHERE 
id = :id", dbTables.getSegmentsTable());

Review Comment:
   ```suggestion
               final String deleteSql = StringUtils.format("DELETE from %s 
WHERE id = :id", dbTables.getSegmentsTable());
   ```
   
   **Side comment:**
   Since `id` is already the primary key, it would have proper indexes on it. I 
wonder if using an `IN` clause of say 50 ids each would further boost 
performance. Would require some validation though. I had recently done a 
similar validation for marking segments as unused. Performance seemed to hit a 
limit around 50 in my case. Only con is that building a bindable `IN` clause is 
not too neat. You would need to build a SQL that has 50 bindable params and 
then bind them one by one to the respective segment IDs.
   
   I was also thinking we could have a `WHERE dataSource = :dataSource` but not 
sure if it would really help since we already using the primary key in the SQL.
   
   @jasonk000 , @abhishekrb19 , what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] (perf) server: batch SQL Metadata deleteSegments and updateSegmentMetadata (druid)

Reply via email to