Brian Hulette created BEAM-13133:
------------------------------------

             Summary: sample() imposes partitioning by index unnecessarily
                 Key: BEAM-13133
                 URL: https://issues.apache.org/jira/browse/BEAM-13133
             Project: Beam
          Issue Type: Task
          Components: dsl-dataframe
            Reporter: Brian Hulette
            Assignee: Brian Hulette


I noticed that sample() requires data to repartitioned when it's used at the 
beginning of a series of dataframe commands. In practice we should be able to 
sample within arbitrary partitions before combining the partitions to produce 
the final result.

It looks like the root cause is that our sample expressions require 
partitioning by index, rather than arbitrary partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to