How to ease the pain of loading the large Summary file?

Yan Qi Mon, 20 Jul 2015 10:41:52 -0700

Right now we are using a MapReduce job to convert some data and store the
result in the Parquet format. The size can be tens of terabytes, leading to
a pretty large summary file (i.e., _metadata).


When we try to use another MapReduce job to read the result, it takes
forever to load the metadata.

We are wondering if it is possible to reduce (ideally eliminate) the cost
of loading the summary file while staring a MR job.


Thanks,

Yan

How to ease the pain of loading the large Summary file?

Reply via email to