Hi Everyone, I am facing issues while reading parquet data which was dumped by spark. Parquet Loader is taking a huge time to determine schema when not passed explicitly. Schema to be read is huge and not easy to construct. Have few questions -
1. Why does ParquetLoader take huge time determining schema while spark takes almost no time for the same. 2. How does pig determine schema for parquet. If I take a dump of interpreted schema in pig from a given file , can I reuse the schema string deterministically later? 3. Is there a way to easily convert spark schema string to pig schema string. 4. Is there a way to use glue to read schema while loading data via ParquetLoader -- Thanks Sudeep