JackieTien97 commented on code in PR #744:
URL: https://github.com/apache/tsfile/pull/744#discussion_r2944758562


##########
java/tsfile/src/main/java/org/apache/tsfile/read/common/block/TsBlockUtil.java:
##########
@@ -75,10 +76,32 @@ public static TsBlock applyFilterAndLimitOffsetToTsBlock(
       TsBlockBuilder builder,
       Filter pushDownFilter,
       PaginationController paginationController) {
+    return applyFilterAndLimitOffsetToTsBlock(
+        unFilteredBlock, builder, pushDownFilter, paginationController, null);
+  }
+
+  public static TsBlock applyFilterAndLimitOffsetToTsBlock(
+      TsBlock unFilteredBlock,
+      TsBlockBuilder builder,
+      Filter pushDownFilter,
+      PaginationController paginationController,
+      Consumer<Long> filterRowsRecorder) {

Review Comment:
   ```suggestion
         LongConsumer filterRowsRecorder) {
   ```



##########
java/tsfile/src/main/java/org/apache/tsfile/read/reader/IPageReader.java:
##########
@@ -39,6 +40,8 @@ default BatchData getAllSatisfiedPageData() throws 
IOException {
 
   TsBlock getAllSatisfiedData() throws IOException;
 
+  TsBlock getAllSatisfiedData(Consumer<Long> filterRowsRecorder) throws 
IOException;

Review Comment:
   add some java doc comments about this method, explain its difference between 
original `getAllSatisfiedData()` method.



##########
java/tsfile/src/main/java/org/apache/tsfile/read/filter/basic/Filter.java:
##########
@@ -117,6 +118,44 @@ public abstract class Filter {
    */
   public abstract boolean[] satisfyTsBlock(boolean[] selection, TsBlock 
tsBlock);
 
+  public final boolean[] satisfyTsBlock(
+      boolean[] selection, TsBlock tsBlock, Consumer<Long> filterRowsRecorder) 
{

Review Comment:
   ```suggestion
         boolean[] selection, TsBlock tsBlock, LongConsumer filterRowsRecorder) 
{
   ```



##########
java/tsfile/src/main/java/org/apache/tsfile/read/reader/page/PageReader.java:
##########
@@ -212,6 +213,11 @@ public BatchData getAllSatisfiedPageData(boolean 
ascending) throws IOException {
 
   @Override
   public TsBlock getAllSatisfiedData() throws IOException {
+    return getAllSatisfiedData(null);
+  }
+
+  @Override
+  public TsBlock getAllSatisfiedData(Consumer<Long> filterRowsRecorder) throws 
IOException {

Review Comment:
   ```suggestion
     public TsBlock getAllSatisfiedData(LongConsumer filterRowsRecorder) throws 
IOException {
   ```



##########
java/tsfile/src/main/java/org/apache/tsfile/read/reader/page/AbstractAlignedPageReader.java:
##########
@@ -215,6 +217,55 @@ public TsBlock getAllSatisfiedData() throws IOException {
         unFilteredBlock, builder, pushDownFilter, paginationController);
   }
 
+  /**
+   * get all satisfied data while record the number of filter rows. if one 
tuple is not satisfy by
+   * the filter and is deleted at the same time, the tuple cannot be 
considered as a filtered data.
+   */
+  @Override
+  public TsBlock getAllSatisfiedData(Consumer<Long> filterRowsRecorder) throws 
IOException {

Review Comment:
   ```suggestion
     public TsBlock getAllSatisfiedData(LongConsumer filterRowsRecorder) throws 
IOException {
   ```



##########
java/tsfile/src/main/java/org/apache/tsfile/read/reader/IPageReader.java:
##########
@@ -39,6 +40,8 @@ default BatchData getAllSatisfiedPageData() throws 
IOException {
 
   TsBlock getAllSatisfiedData() throws IOException;
 
+  TsBlock getAllSatisfiedData(Consumer<Long> filterRowsRecorder) throws 
IOException;

Review Comment:
   ```suggestion
     TsBlock getAllSatisfiedData(LongConsumer filterRowsRecorder) throws 
IOException;
   ```



##########
java/tsfile/src/main/java/org/apache/tsfile/read/filter/basic/Filter.java:
##########
@@ -117,6 +118,44 @@ public abstract class Filter {
    */
   public abstract boolean[] satisfyTsBlock(boolean[] selection, TsBlock 
tsBlock);
 
+  public final boolean[] satisfyTsBlock(
+      boolean[] selection, TsBlock tsBlock, Consumer<Long> filterRowsRecorder) 
{
+
+    int inputCount = countSelectedRows(selection);
+    boolean[] result = satisfyTsBlock(selection, tsBlock);
+    int outputCount = countSelectedRows(result);
+    if (inputCount > outputCount) {
+      filterRowsRecorder.accept((long) (inputCount - outputCount));
+    }
+
+    return result;
+  }
+
+  private static int countSelectedRows(boolean[] selection) {
+    if (selection == null) return 0;
+    int count = 0;
+    int length = selection.length;
+    int i = 0;
+
+    // calculate multi times
+    for (; i < length - 7; i += 8) {
+      count +=
+          (selection[i] ? 1 : 0)
+              + (selection[i + 1] ? 1 : 0)
+              + (selection[i + 2] ? 1 : 0)
+              + (selection[i + 3] ? 1 : 0)
+              + (selection[i + 4] ? 1 : 0)
+              + (selection[i + 5] ? 1 : 0)
+              + (selection[i + 6] ? 1 : 0)
+              + (selection[i + 7] ? 1 : 0);
+    }
+    for (; i < length; i++) {
+      count += (selection[i] ? 1 : 0);
+    }

Review Comment:
   On modern JVMs, a simple loop for (boolean b : selection) if (b) count++ 
would likely be auto-vectorized by the JIT compiler just as efficiently. The 
manual unrolling adds code complexity without guaranteed benefit. Consider 
simplifying, or at minimum add a comment explaining the performance rationale.



##########
java/tsfile/src/main/java/org/apache/tsfile/read/filter/basic/Filter.java:
##########
@@ -117,6 +118,44 @@ public abstract class Filter {
    */
   public abstract boolean[] satisfyTsBlock(boolean[] selection, TsBlock 
tsBlock);
 
+  public final boolean[] satisfyTsBlock(
+      boolean[] selection, TsBlock tsBlock, Consumer<Long> filterRowsRecorder) 
{
+
+    int inputCount = countSelectedRows(selection);
+    boolean[] result = satisfyTsBlock(selection, tsBlock);
+    int outputCount = countSelectedRows(result);
+    if (inputCount > outputCount) {
+      filterRowsRecorder.accept((long) (inputCount - outputCount));
+    }
+
+    return result;
+  }
+
+  private static int countSelectedRows(boolean[] selection) {
+    if (selection == null) return 0;
+    int count = 0;
+    int length = selection.length;
+    int i = 0;
+
+    // calculate multi times

Review Comment:
   The comment // calculate multi times (line 140) is unclear. Perhaps // 
process 8 elements at a time for better throughput?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to