Hi Prathamesh,

thank you for your interest in our project and your suggestion for a fix to the 
issue. The issue is currently fixed in PR #692. Let us know if you have any 
further questions. I will contact you later regarding the GSoC application.

Best
--
Zoi

On 2026/02/17 11:28:57 4087_PRATHAMESH_ DHANASHRI wrote:
> Hi everyone,
> 
> My name is Prathamesh Dhanashri, and I'd like to introduce myself as a new
> contributor to Apache Wayang. I'm interested in contributing to the project
> as part of Google Summer of Code (GSoC) and am looking to familiarize
> myself with the codebase by working on open issues.
> 
> As a starting point, I've been investigating* issue #690 *(
> https://github.com/apache/wayang/issues/690) reported by zkaoudi regarding
> a CSV parsing error when reading from the filesystem using the SQL API. The
> error occurs at *JavaCSVTableSource.java line 127* where *tokens.length !=
> fieldTypes.size()*.
> 
> *Root Cause:*
> In *WayangTableScanVisitor.java (line 67)*, the fieldTypes list is built
> from *wayangRelNode.getRowType()*, which returns the RelNode's row type. In
> certain configurations, this row type may have fewer fields than the actual
> table schema (e.g., when Calcite optimizes away unused columns). However,
> the CSV source always reads all columns from disk, causing a mismatch
> between tokens.length and fieldTypes.size().
> 
> *Proposed Fix:*
> Change line 67 of *WayangTableScanVisitor.java* from:
> *final List<RelDataType> fieldTypes =
> wayangRelNode.getRowType().getFieldList().stream()*
> to:
> *final List<RelDataType> fieldTypes =
> wayangRelNode.getTable().getRowType().getFieldList().stream()*
> Using *getTable().getRowType()* always returns the full table schema,
> consistent with how getColumnNames() already works in *WayangTableScan.java
> (line 98)*. The downstream WayangProject operator handles column selection
> separately via a MapOperator.
> 
> *Testing:*
> I've written a regression test using Mockito that simulates a
> WayangTableScan with a trimmed row type (1 field) while the table has 4
> fields, reproducing the exact scenario described in the issue. The test
> fails before the fix and passes after. All existing tests continue to pass.
> 
> I plan to open a PR with this fix and the regression test shortly.
> 
> Looking forward to your feedback!
> 
> Thanks,
> Prathamesh Dhanashri
> 

Reply via email to