Rohini Palaniswamy created PIG-5313:
---------------------------------------
Summary: Support PARALLEL in STORE statement
Key: PIG-5313
URL: https://issues.apache.org/jira/browse/PIG-5313
Project: Pig
Issue Type: New Feature
Components: tez
Reporter: Rohini Palaniswamy
Restricting number of files in output is a very common use case. In Pig,
currently users add a ORDER BY, GROUP BY or DISTINCT with the required
parallelism before STORE to achieve it. All of the above operations create
unnecessary overhead in processing. It would be ideal if STORE clause supported
the PARALLEL statement and the partitioning of data was handled in a more
simple and efficient manner.
This jira is more Tez specific and requires TEZ-3865. More details are in that
jira regarding how it can be done via Tez. We will also have to add APIs to
StoreFunc (HCatStorer, MultiStorage, etc) to get partition keys to partition
the data for store statement.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)