Hi, I am new to Spark and have been playing around with the Parquet reader code. I have two questions:
1. I saw the code that starts at DataSourceScanExec class, and moves on to the ParquetFileFormat class and does a VectorizedParquetRecordReader. I tried doing a spark.read.parquet(...) and debugged through the code, but for some reason it never hit the breakpoints I placed in these classes. Perhaps I am doing something wrong, but is there a certain versioning for parquet readers that I am missing out on? How do I make the code take the DataSourceScanExec -> ... -> ParquetReader ... -> VectorizedParqeutRecordRead ... route? 2. If I do manage to make it take the above path, I see there is a point at which the data is filled into ColumnarBatch objects, has anyone tried returning all the data as ColumnarBatch? Is there any reading material you can point me to? Thanks in advance, this will be super helpful for me!