Is there a way to read a Parquet File as ColumnarBatch?

Priyanka Gomatam Mon, 22 Apr 2019 08:29:20 -0700

Hi,
I am new to Spark and have been playing around with the Parquet reader code. I 
have two questions:


  1.  I saw the code that starts at DataSourceScanExec class, and moves on to 
the ParquetFileFormat class and does a VectorizedParquetRecordReader. I tried 
doing a spark.read.parquet(...) and debugged through the code, but for some 
reason it never hit the breakpoints I placed in these classes. Perhaps I am 
doing something wrong, but is there a certain versioning for parquet readers 
that I am missing out on? How do I make the code take the DataSourceScanExec -> 
... -> ParquetReader ... -> VectorizedParqeutRecordRead ... route?
  2.  If I do manage to make it take the above path, I see there is a point at 
which the data is filled into ColumnarBatch objects, has anyone tried returning 
all the data as ColumnarBatch? Is there any reading material you can point me 
to?
Thanks in advance, this will be super helpful for me!

Is there a way to read a Parquet File as ColumnarBatch?

Reply via email to