hi all, Yesterday I filed a CSV parsing bug [1] for Spark, that leads to data incorrectness when data contains sequences similar to the one in the report.
I wanted to take a look at the parsing logic to see if I could spot the error to update the issue with more information and to possibly contribute a PR with a bug fix, but I got completely lost navigating my way down the dependencies in the Spark repository. Can someone point me in the right direction? I am looking for the csv parser itself, which is likely a dependency? The next question might need too much knowledge about Spark internals to know where to look or understand what I'd be looking at, but I am also looking to see if and why the implementation of the CSV parsing is different when columns are projected as opposed to the processing of the full dataframe/ The issue only occurs when projecting columns and this inconsistency is a worry in itself. Many thanks, Marnix 1. https://issues.apache.org/jira/browse/SPARK-38167