HDFS supports quotas, so you can control it that way, but obviously this will affect all your HDFS users not just Pig scripts.

Alan.

On Aug 25, 2010, at 3:52 PM, jiang licht wrote:

Is there a way to tell Pig to restrict the size of map/reduce output that can be saved to dfs? E.g. if a job creates over-limit data, it won't be allowed to save the result to the dfs and the job will fail.

This will help to prevent unexpected huge data from being saved to dfs by mapper/reducer created by a Pig script. This means we have an estimate of how much data will be generated by a Pig script in advance. Then, with this quota, if over-sized result is generated, it won't be saved and the job fails.

Thanks,
Michael



Reply via email to