Hi, I have not forgotten about this one (sorry for the delay), The default parallelism is calculated to use the ‘optimal’ number of cores and I think it is a reasonable default (it maximizes core utilization in particular for streaming). I prefer not to change this until we have a better way to replace the default value (if you have any suggestion on how to do this with the new approach, it is welcome).
I want to include your changes but not as the default for the moment, but let’s say an ‘alternative’ only applied if the user sets the bundle size (we have to doc the partitioner change and mark this method @Experimental). This way we can evaluate if it double shuffles happens or not, and eventually if the performance advantages justify making this behavior the default. WDYT ? Beam design philosophy has always being to reduce ‘knobs’ to its minimum, but I understand that with Spark this might be sometimes needed. [ Full content available at: https://github.com/apache/beam/pull/6181 ] This message was relayed via gitbox.apache.org for devnull@infra.apache.org
