Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Yang Wang Thu, 07 Aug 2025 02:37:44 -0700

Hi yuxia,

Thank you for your review and suggestions. I'll address all of your
questions:

1. I believe we should add a control flag to the server-side options,
disabled by default. This would be useful for maintaining backward
compatibility during the upgrade process, as it would prevent the cluster
from writing the new RecordBatch format to disk until our upgrade process
is successful. Without this, the older version servers wouldn't recognize
the new data format, and any persisted new version data would prevent us
from rolling back (downgrading) again. We can enable this option only when
we're certain that the upgrade has succeeded and we won't need to downgrade
again.  I'll add more details about this to the FIP document.

2. We will add a field to PbFetchLogReqForTable, and thanks to protobuf's
backward compatibility, this change will be non-breaking:
```
message PbFetchLogReqForTable {
  required int64 table_id = 1;
  required bool projection_pushdown_enabled = 2;
  repeated int32 projected_fields = 3 [packed = true];
  repeated PbFetchLogReqForBucket buckets_req = 4;
  optional PbPredicate recordBatchFilter = 5;
}
```
I'll include this in the interface modification section of the FIP document.

3. To avoid making the FIP too lengthy and detailed, I omitted some
protobuf message definitions. The detailed implementation can be reviewed
in the PR and doesn't affect the overall design. If you prefer, I can also
add these details to the FIP.

4. In the current POC code implementation, I didn't use the CompactedRow
format, just a plain type serde form. I think your suggestion is excellent
as it would reduce space amplification further. I'll the implementation
accordingly.

Best regards,
Yang

yuxia <luoyu...@alumni.sjtu.edu.cn> 于2025年8月7日周四 16:56写道：

> Thanks Yang for driving this work. A greate improvement. Few questions are
> below:
>
> 1. In Migration Strategy partion, it said "Deploy new version with feature
> flag disabled", seems the filter pushdown is controlled by a option? What's
> the option looks like, a client
> or server option? Is it enabled or disabled by default. I haven't seen the
> option/flag in the FIP.
>
> 2. Seems the rpc request `FetchLogRequest` should changes to include
> `predicates`? Could you describe what changes in `FetchLogRequest`, it's
> also a part of public interface changes
>
> 3. Noticed `PbLiteralValue`/`PbDataType` in Protocol Buffer Definitions,
> but haven't seen the defination of them, are they missed?
>
> 4. Curious about how do you want to serialize the
> `LogRecordBatchStatistics`, will you reuse the encoding ways of Fluss
> compacted row or other things
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "loserwang1024" <loserwang1...@gmail.com>
> 收件人: "dev" <dev@fluss.apache.org>
> 发送时间: 星期四, 2025年 8 月 07日 下午 2:18:37
> 主题: Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown
>
> Hi Yang,
>
> Thanks for your great work — this change indeed reduces the cost of
> filtered queries. I just have a few questions for clarification:
>
>
> 1. Fluent API Design for LogScanner Currently, we have:
>
> > LogScanner createLogScanner(Predicate recordBatchFilter);
> >
>  Would it be possible to make the interface more aligned with the fluent
> design pattern used in Jark’s refactoring?[1] For example:
>
> > table.newScan() .project(projectedFields) .filter(recordBatchFilter)
> > .createLogScanner();
> >
>
> 2. LogRecordBatchStatistics now supports min, max, and null count, and will
> be serialized into RecordBatch headers (requiring an upgrade from V1 to V2
> format). If we plan to support additional statistics in the future, will we
> need to upgrade to V3? Or has V2 already been designed with extensibility
> in mind?
>
> 3. When SupportsFilterPushDown#applyFilters pushes filters down to the
> source, how does the source determine whether a filter can actually be
> pushed down? Even if the user is on the latest version of Fluss that
> supports V2 format, existing data might still be in V1 format (which
> doesn’t include statistics). Will this compatibility issue be handled on
> the client side?
>
> Looking forward to your thoughts!
>
>
> Best
>
> Hongshun
>
> [1] https://github.com/apache/fluss/issues/340
>
> On Thu, Aug 7, 2025 at 11:12 AM Yang Wang <platinumhamb...@gmail.com>
> wrote:
>
> > Hello Fluss Community,
> >
> > I propose initiating discussion on FIP-10: Support Log RecordBatch Filter
> > Pushdown (
> >
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-10%3A+Support+Log+RecordBatch+Filter+Pushdown
> > ).
> > This optimization aims to improve the performance of Log table queries
> and
> > is now ready for community feedback.
> >
> > This FIP introduces RecordBatch-level filter pushdown to enable early
> > filtering at the storage layer, thereby optimizing CPU, memory, and
> network
> > resources by skipping non-matching log record batches.
> >
> > A proof-of-concept (PoC) has been implemented in the logfilter branch in
> > https://github.com/platinumhamburg/fluss and is ready for testing and
> > preview.
> >
>

Re: [DISCUSS] FIP-10: Support Log RecordBatch Filter Pushdown

Reply via email to