Satish Subhashrao Saley created PIG-5365: --------------------------------------------
Summary: Add support for PARALLEL clause in LOAD statement Key: PIG-5365 URL: https://issues.apache.org/jira/browse/PIG-5365 Project: Pig Issue Type: New Feature Reporter: Satish Subhashrao Saley Assignee: Satish Subhashrao Saley It is tiresome to keep telling users to increase pig.maxCombinedSplitSize to 512MB or 1G when they are reading TBs of data to avoid launching too many map tasks (50-100K) for loading data. It has unnecessary overhead in terms of container launch and wastes lot of resources. Would be good to have a new settings to configure the max number of tasks which will override pig.maxCombinedSplitSize and combine more splits into one task. For eg: pig.max.input.splits=30000 and data size is 2TB, it will combine more than 128MB (default pig.maxCombinedSplitSize) per task to have maximum of 30K tasks. That will go as default into pig-default.properties and apply to all users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)