Rohini Palaniswamy created PIG-5313:
---------------------------------------

             Summary: Support PARALLEL in STORE statement
                 Key: PIG-5313
                 URL: https://issues.apache.org/jira/browse/PIG-5313
             Project: Pig
          Issue Type: New Feature
          Components: tez
            Reporter: Rohini Palaniswamy


Restricting number of files in output is a very common use case. In Pig, 
currently users add a ORDER BY, GROUP BY or DISTINCT with the required 
parallelism before STORE to achieve it. All of the above operations create 
unnecessary overhead in processing. It would be ideal if STORE clause supported 
the PARALLEL statement and the partitioning of data was handled in a more 
simple and efficient manner.

This jira is more Tez specific and requires TEZ-3865. More details are in that 
jira regarding how it can be done via Tez. We will also have to add APIs to 
StoreFunc (HCatStorer, MultiStorage, etc) to get partition keys to partition 
the data for store statement.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to