Re: [PR] speeds up tablet mgmt iterator [accumulo]

via GitHub Fri, 17 May 2024 06:01:13 -0700


cshannon commented on code in PR #4568:
URL: https://github.com/apache/accumulo/pull/4568#discussion_r1604939498



##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletMetadata.java:
##########
@@ -156,8 +156,13 @@ private TabletMetadata(Builder tmBuilder) {
     this.compacted = tmBuilder.compacted.build();
     this.userCompactionsRequested = tmBuilder.userCompactionsRequested.build();
     this.unSplittableMetadata = tmBuilder.unSplittableMetadata;
-    this.fileSize =
-        Suppliers.memoize(() -> 
files.values().stream().mapToLong(DataFileValue::getSize).sum());
+    this.fileSize = Suppliers.memoize(() -> {

Review Comment:
   Is the performance difference for the sum calculation because the stream is 
iterating twice over the values (once to map the size and then again to sum) or 
just simply the overhead of the stream itself? I think it would be good to add 
a comment here to say why this change was done just so that no one changes it 
back.



##########
server/base/src/main/java/org/apache/accumulo/server/manager/state/TabletManagementIterator.java:
##########
@@ -66,35 +68,53 @@
  * TabletManagement objects for each Tablet that needs some type of action 
performed on it by the
  * Manager.
  */
-public class TabletManagementIterator extends SkippingIterator {
+public class TabletManagementIterator extends WholeRowIterator {
   private static final Logger LOG = 
LoggerFactory.getLogger(TabletManagementIterator.class);
   public static final String TABLET_GOAL_STATE_PARAMS_OPTION = "tgsParams";
   private CompactionJobGenerator compactionGenerator;
   private TabletBalancer balancer;
+  private final SplitConfig splitConfig = new SplitConfig();
+
+  private static class SplitConfig {
+    TableId tableId;
+    long splitThreshold;
+    long maxEndRowSize;
+    int maxFilesToOpen;
+
+    void update(TableId tableId, Configuration tableConfig) {
+      if (!tableId.equals(this.tableId)) {
+        this.tableId = tableId;
+        splitThreshold = ConfigurationTypeHelper
+            
.getFixedMemoryAsBytes(tableConfig.get(Property.TABLE_SPLIT_THRESHOLD.getKey()));
+        maxEndRowSize = ConfigurationTypeHelper
+            
.getFixedMemoryAsBytes(tableConfig.get(Property.TABLE_MAX_END_ROW_SIZE.getKey()));
+        maxFilesToOpen = (int) ConfigurationTypeHelper
+            
.getFixedMemoryAsBytes(tableConfig.get(Property.SPLIT_MAXOPEN.getKey()));
+      }
+    }
+  }
 
   private static boolean shouldReturnDueToSplit(final TabletMetadata tm,
-      final Configuration tableConfig) {
+      final Configuration tableConfig, SplitConfig splitConfig) {
 
-    final long splitThreshold = ConfigurationTypeHelper
-        
.getFixedMemoryAsBytes(tableConfig.get(Property.TABLE_SPLIT_THRESHOLD.getKey()));
-    final long maxEndRowSize = ConfigurationTypeHelper
-        
.getFixedMemoryAsBytes(tableConfig.get(Property.TABLE_MAX_END_ROW_SIZE.getKey()));
-    final int maxFilesToOpen = (int) ConfigurationTypeHelper
-        
.getFixedMemoryAsBytes(tableConfig.get(Property.SPLIT_MAXOPEN.getKey()));
+    // should see the same table many times in a row, so this should only read 
config the first time
+    // seen
+    splitConfig.update(tm.getTableId(), tableConfig);

Review Comment:
   This is a nice way to save time having to convert properties over and over. 
Another nice benefit (and I'm not sure if this was a problem before or 
mattered), but this also essentially grabs as a snapshot of the properties per 
iteration for the table. So this means you would be guaranteed to not have the 
properties change mid-way through iterating over the same table if there were 
concurrent updates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] speeds up tablet mgmt iterator [accumulo]

Reply via email to