[
https://issues.apache.org/jira/browse/CALCITE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diveyam Mishra updated CALCITE-7618:
------------------------------------
Description:
The file adapter's CSV implementation currently supports projection pushdown
but does not appear to support filter pushdown.
For example:
{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}
The resulting plan is:
{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)],
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}
The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather
than during the CSV scan itself. As a result, all rows are read from the
underlying CSV file and filtering occurs afterward.
The file adapter already contains infrastructure related to filtering:
* {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
* {\{CsvTable}} was originally derived from the demo CSV adapter's filterable
implementation.
* The file adapter defines table flavors including \{{FILTERABLE}}.
However, the file adapter currently exposes \{{CsvTranslatableTable}}, and
there does not appear to be a mechanism that translates pushdown-compatible
filter predicates into the filtering capabilities already present in
\{{CsvEnumerator}}.
Evidence:
* Physical plan for the query above contains \{{EnumerableCalc}} over
\{{CsvTableScan}}.
* Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
* No file-adapter-specific filter pushdown rule appears to fire.
* Filter evaluation remains outside the scan.
This results in unnecessary parsing and processing of rows that could
potentially be eliminated during scanning.
was:
Currently, Apache Calcite has two distinct CSV adapter implementations:
1. `example/csv` (demo adapter under `org.apache.calcite.adapter.csv`)
2. `file` (production file adapter under `org.apache.calcite.adapter.file`)
*The Problem*
While `example/csv` supports three flavors of tables (`SCANNABLE`, `FILTERABLE`
with equality filter pushdown, and `TRANSLATABLE`), the `file` adapter only
implements `CsvTranslatableTable`.
Because of this, the `file` adapter only supports projection pushdown and lacks
any filter pushdown support, meaning queries using filters on file-adapter
tables cannot push down filter evaluation.
Interestingly, `org.apache.calcite.adapter.file.CsvEnumerator` already includes
`filterValues` logic to skip rows matching simple equality filters. However,
there is no corresponding `CsvFilterableTable` or planning rule in the `file`
adapter module to pass filters down to it.
*Proposed Sol.*
Add filter pushdown capability to the `file` adapter, either by introducing a
`CsvFilterableTable` implementation (analogous to the example adapter) or
adding a planner rule to push down filters into the scan.
+*PS : Thats my first time creating an Issue do let me know if i missed any imp
detail*+
> Add filter pushdown support to the file adapter's CSV table implementation
> --------------------------------------------------------------------------
>
> Key: CALCITE-7618
> URL: https://issues.apache.org/jira/browse/CALCITE-7618
> Project: Calcite
> Issue Type: Improvement
> Components: file-adapter
> Reporter: Diveyam Mishra
> Assignee: Diveyam Mishra
> Priority: Minor
>
> The file adapter's CSV implementation currently supports projection pushdown
> but does not appear to support filter pushdown.
> For example:
> {code}
> SELECT name, empno
> FROM EMPS
> WHERE deptno = 20
> {code}
> The resulting plan is:
> {code}
> PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)],
> NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
> {code}
> The filter condition is evaluated in the upper \{{EnumerableCalc}} node
> rather than during the CSV scan itself. As a result, all rows are read from
> the underlying CSV file and filtering occurs afterward.
> The file adapter already contains infrastructure related to filtering:
> * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
> * {\{CsvTable}} was originally derived from the demo CSV adapter's
> filterable implementation.
> * The file adapter defines table flavors including \{{FILTERABLE}}.
> However, the file adapter currently exposes \{{CsvTranslatableTable}}, and
> there does not appear to be a mechanism that translates pushdown-compatible
> filter predicates into the filtering capabilities already present in
> \{{CsvEnumerator}}.
> Evidence:
> * Physical plan for the query above contains \{{EnumerableCalc}} over
> \{{CsvTableScan}}.
> * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
> * No file-adapter-specific filter pushdown rule appears to fire.
> * Filter evaluation remains outside the scan.
> This results in unnecessary parsing and processing of rows that could
> potentially be eliminated during scanning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)