mosche commented on code in PR #24009:
URL: https://github.com/apache/beam/pull/24009#discussion_r1026211619
##########
runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java:
##########
@@ -81,27 +80,13 @@ public class PipelineTranslatorBatch extends
PipelineTranslator {
TRANSFORM_TRANSLATORS.put(
SplittableParDo.PrimitiveBoundedRead.class, new
ReadSourceTranslatorBatch<>());
-
Review Comment:
This is unrelated to #24035, see comment below
> PCollectionView view translation just stored the same Spark dataset
(reference!) again for a different PTransform. That's obviously problematic for
caching as we're not gathering metadata on that dataset in a single place.
Also, beam runner guidelines discourage translation of PCollectionView, they
are just there for legacy reasons.
In terms of prep for #24035, that's mostly the introduction of
`TranslationResult` to capture all kinds of metadata / context on a specific
Spark dataset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]