ParquetLoader - reading parquet data dumped originally by spark code

Sudeep Khemka Tue, 01 Mar 2022 18:20:48 -0800

Hi Everyone,

I am facing issues while reading parquet data which was dumped by spark.
Parquet Loader is taking a huge time to determine schema when not passed
explicitly. Schema to be read is huge and not easy to construct. Have few
questions -



   1. Why does ParquetLoader take huge time determining schema while spark
   takes almost no time for the same.
   2. How does pig determine schema for parquet. If I take a dump of
   interpreted schema in pig from a given file , can I reuse the schema string
   deterministically later?
   3. Is there a way to easily convert spark schema string to pig schema
   string.
   4. Is there a way to use glue to read schema while loading data via
   ParquetLoader


-- 
Thanks
Sudeep

ParquetLoader - reading parquet data dumped originally by spark code

Reply via email to