[ 
https://issues.apache.org/jira/browse/CALCITE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diveyam Mishra updated CALCITE-7618:
------------------------------------
    Description: 
The file adapter's CSV implementation currently supports projection pushdown 
but does not appear to support filter pushdown.

For example:

{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}

The resulting plan is:

{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], 
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}

The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather 
than during the CSV scan itself. As a result, all rows are read from the 
underlying CSV file and filtering occurs afterward.

The file adapter already contains infrastructure related to filtering:
 * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
 * {\{CsvTable}} was originally derived from the demo CSV adapter's filterable 
implementation.
 * The file adapter defines table flavors including \{{FILTERABLE}}.

However, the file adapter currently exposes \{{CsvTranslatableTable}}, and 
there does not appear to be a mechanism that translates pushdown-compatible 
filter predicates into the filtering capabilities already present in 
\{{CsvEnumerator}}.

Evidence:
 * Physical plan for the query above contains \{{EnumerableCalc}} over 
\{{CsvTableScan}}.
 * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
 * No file-adapter-specific filter pushdown rule appears to fire.
 * Filter evaluation remains outside the scan.

This results in unnecessary parsing and processing of rows that could 
potentially be eliminated during scanning.

  was:
Currently, Apache Calcite has two distinct CSV adapter implementations:
1. `example/csv` (demo adapter under `org.apache.calcite.adapter.csv`)
2. `file` (production file adapter under `org.apache.calcite.adapter.file`)

*The Problem*
While `example/csv` supports three flavors of tables (`SCANNABLE`, `FILTERABLE` 
with equality filter pushdown, and `TRANSLATABLE`), the `file` adapter only 
implements `CsvTranslatableTable`. 

Because of this, the `file` adapter only supports projection pushdown and lacks 
any filter pushdown support, meaning queries using filters on file-adapter 
tables cannot push down filter evaluation.

Interestingly, `org.apache.calcite.adapter.file.CsvEnumerator` already includes 
`filterValues` logic to skip rows matching simple equality filters. However, 
there is no corresponding `CsvFilterableTable` or planning rule in the `file` 
adapter module to pass filters down to it.

*Proposed Sol.*
Add filter pushdown capability to the `file` adapter, either by introducing a 
`CsvFilterableTable` implementation (analogous to the example adapter) or 
adding a planner rule to push down filters into the scan. 

+*PS : Thats my first time creating an Issue do let me know if i missed any imp 
detail*+


> Add filter pushdown support to the file adapter's CSV table implementation
> --------------------------------------------------------------------------
>
>                 Key: CALCITE-7618
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7618
>             Project: Calcite
>          Issue Type: Improvement
>          Components: file-adapter
>            Reporter: Diveyam Mishra
>            Assignee: Diveyam Mishra
>            Priority: Minor
>
> The file adapter's CSV implementation currently supports projection pushdown 
> but does not appear to support filter pushdown.
> For example:
> {code}
> SELECT name, empno
> FROM EMPS
> WHERE deptno = 20
> {code}
> The resulting plan is:
> {code}
> PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], 
> NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
> {code}
> The filter condition is evaluated in the upper \{{EnumerableCalc}} node 
> rather than during the CSV scan itself. As a result, all rows are read from 
> the underlying CSV file and filtering occurs afterward.
> The file adapter already contains infrastructure related to filtering:
>  * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
>  * {\{CsvTable}} was originally derived from the demo CSV adapter's 
> filterable implementation.
>  * The file adapter defines table flavors including \{{FILTERABLE}}.
> However, the file adapter currently exposes \{{CsvTranslatableTable}}, and 
> there does not appear to be a mechanism that translates pushdown-compatible 
> filter predicates into the filtering capabilities already present in 
> \{{CsvEnumerator}}.
> Evidence:
>  * Physical plan for the query above contains \{{EnumerableCalc}} over 
> \{{CsvTableScan}}.
>  * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
>  * No file-adapter-specific filter pushdown rule appears to fire.
>  * Filter evaluation remains outside the scan.
> This results in unnecessary parsing and processing of rows that could 
> potentially be eliminated during scanning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to