Hi Prathamesh, thank you for your interest in our project and your suggestion for a fix to the issue. The issue is currently fixed in PR #692. Let us know if you have any further questions. I will contact you later regarding the GSoC application.
Best -- Zoi On 2026/02/17 11:28:57 4087_PRATHAMESH_ DHANASHRI wrote: > Hi everyone, > > My name is Prathamesh Dhanashri, and I'd like to introduce myself as a new > contributor to Apache Wayang. I'm interested in contributing to the project > as part of Google Summer of Code (GSoC) and am looking to familiarize > myself with the codebase by working on open issues. > > As a starting point, I've been investigating* issue #690 *( > https://github.com/apache/wayang/issues/690) reported by zkaoudi regarding > a CSV parsing error when reading from the filesystem using the SQL API. The > error occurs at *JavaCSVTableSource.java line 127* where *tokens.length != > fieldTypes.size()*. > > *Root Cause:* > In *WayangTableScanVisitor.java (line 67)*, the fieldTypes list is built > from *wayangRelNode.getRowType()*, which returns the RelNode's row type. In > certain configurations, this row type may have fewer fields than the actual > table schema (e.g., when Calcite optimizes away unused columns). However, > the CSV source always reads all columns from disk, causing a mismatch > between tokens.length and fieldTypes.size(). > > *Proposed Fix:* > Change line 67 of *WayangTableScanVisitor.java* from: > *final List<RelDataType> fieldTypes = > wayangRelNode.getRowType().getFieldList().stream()* > to: > *final List<RelDataType> fieldTypes = > wayangRelNode.getTable().getRowType().getFieldList().stream()* > Using *getTable().getRowType()* always returns the full table schema, > consistent with how getColumnNames() already works in *WayangTableScan.java > (line 98)*. The downstream WayangProject operator handles column selection > separately via a MapOperator. > > *Testing:* > I've written a regression test using Mockito that simulates a > WayangTableScan with a trimmed row type (1 field) while the table has 4 > fields, reproducing the exact scenario described in the issue. The test > fails before the fix and passes after. All existing tests continue to pass. > > I plan to open a PR with this fix and the regression test shortly. > > Looking forward to your feedback! > > Thanks, > Prathamesh Dhanashri >
