I have 10k complex parquet files with large footers. The schema for all these files is the same. Drill ended up generating a cache file which is 2.26 GB. Now a simple count(*) query got hung from sqlline and did not return.
In this specific case, I compared the footers for 2 files and there were many parts which are identical. Would it make sense to store the common information once and override the specific details? - Rahul
