abhishekrb19 commented on code in PR #15415:
URL: https://github.com/apache/druid/pull/15415#discussion_r1416231345


##########
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataQuery.java:
##########
@@ -127,15 +127,21 @@ public CloseableIterator<DataSegment> 
retrieveUsedSegments(
    *
    * This call does not return any information about realtime segments.
    *
+   * @param dataSource The name of the datasource
+   * @param intervals  The intervals to search over
+   * @param limit      The limit of segments to return
+   * @param offset     The offset to use when retreiving matching segments. 
Note: This is only applied if the size of
+   *                   intervals is less than {@link #MAX_INTERVALS_PER_BATCH}

Review Comment:
   ```suggestion
      * @param offset     The offset to use when retrieving matching segments. 
Note that offset is only applied to a 
      *                   single batch - i.e., when the number of intervals is 
less than {@link #MAX_INTERVALS_PER_BATCH}.
      *                   For multiple batches, the offset parameter is ignored.
   
   ```



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -133,6 +133,14 @@ public String getCollation()
    */
   public abstract String getQuoteString();
 
+  /**
+   * @return the string offset clause
+   */
+  public String getOffsetClause(int offset)
+  {
+    return StringUtils.format(" OFFSET %s", offset);

Review Comment:
   ```suggestion
       return StringUtils.format(" OFFSET %d", offset);
   ```



##########
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java:
##########
@@ -3050,19 +3106,20 @@ private List<DataSegment> 
createAndGetUsedYearSegments(final int startYear, fina
 
   private ImmutableList<DataSegment> 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(

Review Comment:
   Should this method be renamed to include offset in its name, or perhaps 
could use a simpler name?



##########
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java:
##########
@@ -1208,12 +1208,64 @@ public void 
testRetrieveUnusedSegmentsUsingMultipleIntervalsAndNoLimit() throws
 
     final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
         
segments.stream().map(DataSegment::getInterval).collect(Collectors.toList()),
+        null,
         null
     );
     Assert.assertEquals(segments.size(), actualUnusedSegments.size());
     Assert.assertTrue(segments.containsAll(actualUnusedSegments));
   }
 
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimit() throws 
IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(1900, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
+        ImmutableList.of(),
+        null,
+        null
+    );
+    Assert.assertEquals(segments.size(), actualUnusedSegments.size());
+    Assert.assertTrue(segments.containsAll(actualUnusedSegments));
+  }
+
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimitAndOffset() 
throws IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(2033, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    int offset = 10;
+    List<DataSegment> expectedSegments = 
segments.stream().sorted((Comparator<DataSegment>) (o1, o2) -> {
+      long o1Start = o1.getInterval().getStartMillis();
+      long o2Start = o2.getInterval().getStartMillis();
+      if (o1Start < o2Start) {
+        return -1;
+      } else if (o1Start > o2Start) {
+        return 1;
+      } else {
+        long o1End = o1.getInterval().getEndMillis();
+        long o2End = o2.getInterval().getEndMillis();
+        if (o1End < o2End) {
+          return -1;
+        }
+        return o1End > o2End
+            ? 1
+            : 0;
+      }
+    })
+        .skip(offset)
+        .collect(Collectors.toList());
+    final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
+        ImmutableList.of(),
+        null,
+        offset
+    );
+    Assert.assertEquals(expectedSegments.size(), actualUnusedSegments.size());
+    Assert.assertTrue(expectedSegments.containsAll(actualUnusedSegments));
+  }
+

Review Comment:
   For test coverage, can we also please include new or extend existing tests 
for the following scenarios:
   1. non-empty interval, non-null limit and non-null offset
   2. multiple intervals, non-null limit and non-null offset
   3. multiple intervals exceeding `MAX_INTERVALS_PER_BATCH`, non-null limit 
and non-null offset (the offset in this case would be ignored)



##########
server/src/main/java/org/apache/druid/metadata/storage/derby/DerbyConnector.java:
##########
@@ -100,6 +100,15 @@ public String getQuoteString()
     return QUOTE_STRING;
   }
 
+  /**
+   * @return the string offset clause
+   */
+  @Override
+  public String getOffsetClause(int offset)
+  {
+    return StringUtils.format(" OFFSET %s ROWS", offset);

Review Comment:
   ```suggestion
       return StringUtils.format(" OFFSET %d ROWS", offset);
   ```



##########
server/src/test/java/org/apache/druid/server/http/MetadataResourceTest.java:
##########
@@ -236,6 +246,117 @@ public void testGetAllSegmentsIncludingRealtime()
     Assert.assertEquals(new SegmentStatusInCluster(realTimeSegments[1], false, 
null, 40L, true), resultList.get(5));
   }
 
+  @Test
+  public void testGetUnusedSegmentsInDataSource()
+  {
+    Mockito.doAnswer(mockIterateAllUnusedSegmentsForDatasource())
+        .when(segmentsMetadataManager)
+        .iterateAllUnusedSegmentsForDatasource(
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any());
+
+    // test with null datasource name - fails with expected bad datasource 
name error
+    DruidException e = Assert.assertThrows(
+        DruidException.class,
+        () -> metadataResource.getUnusedSegmentsInDataSource(request, null, 
null, -1, null)
+    );
+    Assert.assertEquals(DruidException.Category.INVALID_INPUT, 
e.getCategory());
+    Assert.assertEquals("dataSource name must be non-empty", e.getMessage());
+
+    // test with empty datasource name - fails with expected bad datasource 
name error
+    e = Assert.assertThrows(

Review Comment:
   Ditto here and everywhere below



##########
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java:
##########
@@ -1208,12 +1208,64 @@ public void 
testRetrieveUnusedSegmentsUsingMultipleIntervalsAndNoLimit() throws
 
     final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
         
segments.stream().map(DataSegment::getInterval).collect(Collectors.toList()),
+        null,
         null
     );
     Assert.assertEquals(segments.size(), actualUnusedSegments.size());
     Assert.assertTrue(segments.containsAll(actualUnusedSegments));
   }
 
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimit() throws 
IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(1900, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
+        ImmutableList.of(),
+        null,
+        null
+    );
+    Assert.assertEquals(segments.size(), actualUnusedSegments.size());
+    Assert.assertTrue(segments.containsAll(actualUnusedSegments));
+  }
+
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimitAndOffset() 
throws IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(2033, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    int offset = 10;
+    List<DataSegment> expectedSegments = 
segments.stream().sorted((Comparator<DataSegment>) (o1, o2) -> {
+      long o1Start = o1.getInterval().getStartMillis();
+      long o2Start = o2.getInterval().getStartMillis();

Review Comment:
   Do we actually need this custom comparator?  
`createAndGetUsedYearSegments()` already returns a list of segments sorted by 
start date. So I think this should suffice: 
   
   `final List<DataSegment> expectedSegments = 
segments.stream().skip(offset).collect(Collectors.toList());`



##########
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java:
##########
@@ -1208,12 +1208,64 @@ public void 
testRetrieveUnusedSegmentsUsingMultipleIntervalsAndNoLimit() throws
 
     final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
         
segments.stream().map(DataSegment::getInterval).collect(Collectors.toList()),
+        null,
         null
     );
     Assert.assertEquals(segments.size(), actualUnusedSegments.size());
     Assert.assertTrue(segments.containsAll(actualUnusedSegments));
   }
 
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimit() throws 
IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(1900, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
+        ImmutableList.of(),
+        null,
+        null
+    );
+    Assert.assertEquals(segments.size(), actualUnusedSegments.size());
+    Assert.assertTrue(segments.containsAll(actualUnusedSegments));
+  }
+
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimitAndOffset() 
throws IOException
+  {
+    final List<DataSegment> segments = createAndGetUsedYearSegments(2033, 
2133);
+    markAllSegmentsUnused(new HashSet<>(segments));
+
+    int offset = 10;
+    List<DataSegment> expectedSegments = 
segments.stream().sorted((Comparator<DataSegment>) (o1, o2) -> {

Review Comment:
   nit: the cast is redundant
   ```suggestion
       final List<DataSegment> expectedSegments = segments.stream().sorted((o1, 
o2) -> {
   ```



##########
server/src/test/java/org/apache/druid/server/http/MetadataResourceTest.java:
##########
@@ -236,6 +246,117 @@ public void testGetAllSegmentsIncludingRealtime()
     Assert.assertEquals(new SegmentStatusInCluster(realTimeSegments[1], false, 
null, 40L, true), resultList.get(5));
   }
 
+  @Test
+  public void testGetUnusedSegmentsInDataSource()
+  {
+    Mockito.doAnswer(mockIterateAllUnusedSegmentsForDatasource())
+        .when(segmentsMetadataManager)
+        .iterateAllUnusedSegmentsForDatasource(
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any(),
+            ArgumentMatchers.any());
+
+    // test with null datasource name - fails with expected bad datasource 
name error
+    DruidException e = Assert.assertThrows(
+        DruidException.class,
+        () -> metadataResource.getUnusedSegmentsInDataSource(request, null, 
null, -1, null)
+    );
+    Assert.assertEquals(DruidException.Category.INVALID_INPUT, 
e.getCategory());
+    Assert.assertEquals("dataSource name must be non-empty", e.getMessage());

Review Comment:
   You could replace this with the exception helper matcher:
   
   ```suggestion
       DruidExceptionMatcher.invalidInput().expectMessageIs(
           "dataSource name must be non-empty"
       ).assertThrowsAndMatches(
           () -> metadataResource.getUnusedSegmentsInDataSource(request, "", 
null, -1, null)
       );
   ```



##########
server/src/test/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinatorTest.java:
##########
@@ -1208,12 +1208,64 @@ public void 
testRetrieveUnusedSegmentsUsingMultipleIntervalsAndNoLimit() throws
 
     final ImmutableList<DataSegment> actualUnusedSegments = 
retrieveUnusedSegmentsUsingMultipleIntervalsAndLimit(
         
segments.stream().map(DataSegment::getInterval).collect(Collectors.toList()),
+        null,
         null
     );
     Assert.assertEquals(segments.size(), actualUnusedSegments.size());
     Assert.assertTrue(segments.containsAll(actualUnusedSegments));
   }
 
+  @Test
+  public void testRetrieveUnusedSegmentsUsingNoIntervalsAndNoLimit() throws 
IOException

Review Comment:
   ```suggestion
     public void testRetrieveUnusedSegmentsUsingNoIntervalsNoLimitNoOffset() 
throws IOException
   ```



##########
server/src/main/java/org/apache/druid/server/http/MetadataResource.java:
##########
@@ -334,6 +336,48 @@ public Response getUsedSegmentsInDataSourceForIntervals(
     return builder.entity(Collections2.transform(segments, 
DataSegment::getId)).build();
   }
 
+  @GET
+  @Path("/datasources/{dataSourceName}/unusedSegments")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getUnusedSegmentsInDataSource(
+      @Context final HttpServletRequest req,
+      @PathParam("dataSourceName") final String dataSource,
+      @QueryParam("interval") @Nullable String interval,
+      @QueryParam("limit") @Nullable Integer limit,
+      @QueryParam("offset") @Nullable Integer offset
+  )
+  {
+    if (dataSource == null || dataSource.isEmpty()) {
+      throw InvalidInput.exception("dataSource name must be non-empty");

Review Comment:
   To match the path param name as-is:
   ```suggestion
         throw InvalidInput.exception("dataSourceName must be non-empty");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to