keith-turner commented on code in PR #5479: URL: https://github.com/apache/accumulo/pull/5479#discussion_r2047657209
########## server/base/src/main/java/org/apache/accumulo/server/util/TableDiskUsage.java: ########## @@ -203,47 +195,44 @@ public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId> tableIds, // For each table ID for (TableId tableId : tableIds) { - // if the table to compute usage is for the metadata table itself then we need to scan the - // root table, else we scan the metadata table - try (Scanner mdScanner = tableId.equals(MetadataTable.ID) - ? client.createScanner(RootTable.NAME, Authorizations.EMPTY) - : client.createScanner(MetadataTable.NAME, Authorizations.EMPTY)) { - mdScanner.fetchColumnFamily(MetadataSchema.TabletsSection.DataFileColumnFamily.NAME); - mdScanner.setRange(new KeyExtent(tableId, null, null).toMetaRange()); - - final Set<TabletFile> files = new HashSet<>(); - - // Read each file referenced by that table - for (Map.Entry<Key,Value> entry : mdScanner) { - final TabletFile file = - new TabletFile(new Path(entry.getKey().getColumnQualifier().toString())); - - // get the table referenced by the file which may not be the same as the current - // table we are scanning if the file is shared between multiple tables - final TableId fileTableRef = file.getTableId(); - - // if this is a ref to a different table than the one we are scanning then we need - // to make sure the table is also linked for this shared file if the table is - // part of the set of tables we are running du on so we can track shared usages - if (!fileTableRef.equals(tableId) && tableIds.contains(fileTableRef)) { - // link the table and the shared file for computing shared sizes - tdu.linkFileAndTable(fileTableRef, file.getFileName()); - } - - // link the file to the table we are scanning for - tdu.linkFileAndTable(tableId, file.getFileName()); - - // add the file size for the table if not already seen for this scan - if (files.add(file)) { - // This tracks the file size for individual files for computing shared file statistics - // later - tdu.addFileSize(file.getFileName(), - new DataFileValue(entry.getValue().get()).getSize()); + // read the metadata + try (var tabletsMetadata = + ((ClientContext) client).getAmple().readTablets().forTable(tableId).build()) { Review Comment: This was only reading the file column previously, can keep doing that to cut down on how much data is read. ```suggestion ((ClientContext) client).getAmple().readTablets().forTable(tableId).fetch(FILES).build()) { ``` ########## server/base/src/test/java/org/apache/accumulo/server/util/TableDiskUsageTest.java: ########## @@ -75,17 +72,20 @@ public static void beforeClass() { @Test public void testSingleTableMultipleTablets() throws Exception { - final ServerContext client = EasyMock.createMock(ServerContext.class); - final Scanner scanner = EasyMock.createMock(Scanner.class); - mockScan(client, scanner, 1); + final ClientContext client = EasyMock.createMock(ClientContext.class); Review Comment: The more levels of mocking there is in test code, the harder it is to refactor code the cod under test. Sometimes its better to change the code to make it easy to test and to minimize the need for mocking. For example maybe could have changed : ```java public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId> tableIds, AccumuloClient client) ``` to : ```java public static Map<SortedSet<String>,Long> getDiskUsage(Set<TableId> tableIds, Function<TableId, Iterable<Map<StoredTabletFile,DataFileValue>>> tableFilesProvider) ``` Maybe this would make the code easier to test. Not a change for this PR as the changes w/ easy mock were already made. just something to consider for future changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org