Brian Hulette created BEAM-11881:
------------------------------------
Summary: DataFrame subpartitioning order is incorrect
Key: BEAM-11881
URL: https://issues.apache.org/jira/browse/BEAM-11881
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Reporter: Brian Hulette
Currently we've defined
Nothing() < Index([i]) < Index([i,j]) < .. < Index() < Singleton()
s.t. Singleton is a subpartitoning of Index, is a subpartitioning of
Index([i,j]), but this is incorrect. The order should be
Singleton() < Index([i]) < Index([i,j]) < .. < Index() < Nothing()
s.t. every other partitioning is a subpartitioning of Singleton. This is
logical, since Singleton will collect the largest amount of data on a single
node, partitioning by a single index will be alittle more distributed, and
partitioning by the full Index() will be the most distribtued.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)