Right now we are using a MapReduce job to convert some data and store the
result in the Parquet format. The size can be tens of terabytes, leading to
a pretty large summary file (i.e., _metadata).

When we try to use another MapReduce job to read the result, it takes
forever to load the metadata.

We are wondering if it is possible to reduce (ideally eliminate) the cost
of loading the summary file while staring a MR job.


Thanks,

Yan

Reply via email to