Hi everyone, My name is Prathamesh Dhanashri, and I'd like to introduce myself as a new contributor to Apache Wayang. I'm interested in contributing to the project as part of Google Summer of Code (GSoC) and am looking to familiarize myself with the codebase by working on open issues.
As a starting point, I've been investigating* issue #690 *( https://github.com/apache/wayang/issues/690) reported by zkaoudi regarding a CSV parsing error when reading from the filesystem using the SQL API. The error occurs at *JavaCSVTableSource.java line 127* where *tokens.length != fieldTypes.size()*. *Root Cause:* In *WayangTableScanVisitor.java (line 67)*, the fieldTypes list is built from *wayangRelNode.getRowType()*, which returns the RelNode's row type. In certain configurations, this row type may have fewer fields than the actual table schema (e.g., when Calcite optimizes away unused columns). However, the CSV source always reads all columns from disk, causing a mismatch between tokens.length and fieldTypes.size(). *Proposed Fix:* Change line 67 of *WayangTableScanVisitor.java* from: *final List<RelDataType> fieldTypes = wayangRelNode.getRowType().getFieldList().stream()* to: *final List<RelDataType> fieldTypes = wayangRelNode.getTable().getRowType().getFieldList().stream()* Using *getTable().getRowType()* always returns the full table schema, consistent with how getColumnNames() already works in *WayangTableScan.java (line 98)*. The downstream WayangProject operator handles column selection separately via a MapOperator. *Testing:* I've written a regression test using Mockito that simulates a WayangTableScan with a trimmed row type (1 field) while the table has 4 fields, reproducing the exact scenario described in the issue. The test fails before the fix and passes after. All existing tests continue to pass. I plan to open a PR with this fix and the regression test shortly. Looking forward to your feedback! Thanks, Prathamesh Dhanashri
