Re: Fix for parsing error when reading from filesystem with the SQL api #690

Zoi Kaoudi via dev Sat, 21 Feb 2026 23:49:07 -0800

 Oh sorry Prathamesh, I got confused with the names :D
I have left one comment in your PR. Not sure if it is visible. Could you check 
that?
I will get in touch for the project. 
Best
--
Zoi
    Στις Κυριακή 22 Φεβρουαρίου 2026 στις 08:38:09 π.μ. CET, ο χρήστης 
4087_PRATHAMESH_ DHANASHRI <[email protected]> έγραψε:  
 
 Hi Zoi,


Thanks for the update.

Just to clarify, PR #692 was submitted by me I’m glad the issue is
resolved. Please let me know if there’s anything further I can help with
there.

I also wanted to mention that I’m very interested in the Wayang project on
adding new execution engine backends for data lake environments. The work
on backend abstraction, operator mapping, and optimizer integration sounds
particularly interesting to me.

As a relevant contribution, I’ve previously contributed to DataFusion
(datafusion-python, PR #1376), so I already have some familiarity with its
ecosystem, which could be useful if DataFusion is considered as one of the
target platforms.

I’d be happy to explore this further and start contributing in preparation
for GSoC.

Best regards,
Prathamesh

On Sun, 22 Feb, 2026, 12:42 pm Zoi Kaoudi, <[email protected]> wrote:

> Hi Prathamesh,
>
> thank you for your interest in our project and your suggestion for a fix
> to the issue. The issue is currently fixed in PR #692. Let us know if you
> have any further questions. I will contact you later regarding the GSoC
> application.
>
> Best
> --
> Zoi
>
> On 2026/02/17 11:28:57 4087_PRATHAMESH_ DHANASHRI wrote:
> > Hi everyone,
> >
> > My name is Prathamesh Dhanashri, and I'd like to introduce myself as a
> new
> > contributor to Apache Wayang. I'm interested in contributing to the
> project
> > as part of Google Summer of Code (GSoC) and am looking to familiarize
> > myself with the codebase by working on open issues.
> >
> > As a starting point, I've been investigating* issue #690 *(
> > https://github.com/apache/wayang/issues/690) reported by zkaoudi
> regarding
> > a CSV parsing error when reading from the filesystem using the SQL API.
> The
> > error occurs at *JavaCSVTableSource.java line 127* where *tokens.length
> !=
> > fieldTypes.size()*.
> >
> > *Root Cause:*
> > In *WayangTableScanVisitor.java (line 67)*, the fieldTypes list is built
> > from *wayangRelNode.getRowType()*, which returns the RelNode's row type.
> In
> > certain configurations, this row type may have fewer fields than the
> actual
> > table schema (e.g., when Calcite optimizes away unused columns). However,
> > the CSV source always reads all columns from disk, causing a mismatch
> > between tokens.length and fieldTypes.size().
> >
> > *Proposed Fix:*
> > Change line 67 of *WayangTableScanVisitor.java* from:
> > *final List<RelDataType> fieldTypes =
> > wayangRelNode.getRowType().getFieldList().stream()*
> > to:
> > *final List<RelDataType> fieldTypes =
> > wayangRelNode.getTable().getRowType().getFieldList().stream()*
> > Using *getTable().getRowType()* always returns the full table schema,
> > consistent with how getColumnNames() already works in
> *WayangTableScan.java
> > (line 98)*. The downstream WayangProject operator handles column
> selection
> > separately via a MapOperator.
> >
> > *Testing:*
> > I've written a regression test using Mockito that simulates a
> > WayangTableScan with a trimmed row type (1 field) while the table has 4
> > fields, reproducing the exact scenario described in the issue. The test
> > fails before the fix and passes after. All existing tests continue to
> pass.
> >
> > I plan to open a PR with this fix and the regression test shortly.
> >
> > Looking forward to your feedback!
> >
> > Thanks,
> > Prathamesh Dhanashri
> >
>

Re: Fix for parsing error when reading from filesystem with the SQL api #690

Reply via email to