Re: [PR] [Spark] Restore memory sensitive GBK translation (#33520) [beam]

via GitHub Tue, 14 Jan 2025 00:05:29 -0800


JozoVilcek commented on code in PR #33521:
URL: https://github.com/apache/beam/pull/33521#discussion_r1914396421



##########
runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java:
##########
@@ -146,32 +146,36 @@ public void evaluate(GroupByKey<K, V> transform, 
EvaluationContext context) {
 
         JavaRDD<WindowedValue<KV<K, Iterable<V>>>> groupedByKey;
         Partitioner partitioner = getPartitioner(context);
-        // As this is batch, we can ignore triggering and allowed lateness 
parameters.
-        if (windowingStrategy.getWindowFn().equals(new GlobalWindows())
-            && 
windowingStrategy.getTimestampCombiner().equals(TimestampCombiner.END_OF_WINDOW))
 {
-          // we can drop the windows and recover them later
-          groupedByKey =
-              GroupNonMergingWindowsFunctions.groupByKeyInGlobalWindow(
-                  inRDD, keyCoder, coder.getValueCoder(), partitioner);
-        } else if 
(GroupNonMergingWindowsFunctions.isEligibleForGroupByWindow(windowingStrategy)) 
{
+        if 
(GroupNonMergingWindowsFunctions.isEligibleForGroupByWindow(windowingStrategy)) 
{
           // we can have a memory sensitive translation for non-merging windows
           groupedByKey =

Review Comment:
   I tried to address this in my last commits. Secondary sort will be used only 
for GBK ops which are not part of CoGBK. @je-ik please review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Spark] Restore memory sensitive GBK translation (#33520) [beam]

Reply via email to