[ 
https://issues.apache.org/jira/browse/BEAM-8577?focusedWorklogId=343336&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343336
 ]

ASF GitHub Bot logged work on BEAM-8577:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Nov/19 11:16
            Start Date: 14/Nov/19 11:16
    Worklog Time Spent: 10m 
      Work Description: dmvk commented on pull request #10027: [BEAM-8577] 
Initialize FileSystems during Coder deserialization in Re…
URL: https://github.com/apache/beam/pull/10027#discussion_r346256007
 
 

 ##########
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java
 ##########
 @@ -306,11 +308,19 @@ public void translateNode(
     @Override
     public void translateNode(
         Reshuffle<K, InputT> transform, FlinkBatchTranslationContext context) {
-
-      DataSet<WindowedValue<KV<K, InputT>>> inputDataSet =
+      final DataSet<WindowedValue<KV<K, InputT>>> inputDataSet =
           context.getInputDataSet(context.getInput(transform));
-
-      context.setOutputDataSet(context.getOutput(transform), 
inputDataSet.rebalance());
+      @SuppressWarnings("unchecked")
+      final CoderTypeInformation<WindowedValue<KV<K, InputT>>> outputType =
+          ((CoderTypeInformation) inputDataSet.getType())
+              .withPipelineOptions(context.getPipelineOptions());
+      final DataSet<WindowedValue<KV<K, InputT>>> retypedDataSet =
+          new MapOperator<>(
+              inputDataSet,
+              outputType,
+              FlinkIdentityFunction.of(),
 
 Review comment:
   This wouldn't help, the MapOperator is called on "map side" of the 
rebalance. we encounter problems on "reduce side". Only reason for this 
`MapOperator` is, that we are unable to change TypeInformation of the input 
dataset.
   
   However the main reason is, that DoFns, that follow rebalance are currently 
not chainable, therefore there is a slight chance of this happening.
   
   I reckon that after resolving 
https://issues.apache.org/jira/browse/BEAM-8608, this would be no longer 
needed, because we'd init fs in chained DoFn's open method.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 343336)
    Time Spent: 0.5h  (was: 20m)

> FileSystems may have not be initialized during ResourceId deserialization
> -------------------------------------------------------------------------
>
>                 Key: BEAM-8577
>                 URL: https://issues.apache.org/jira/browse/BEAM-8577
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 2.16.0
>            Reporter: David Moravek
>            Assignee: David Moravek
>            Priority: Major
>             Fix For: 2.17.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> - FileSystems use static registration using 
> *FileSystems#setDefaultPipelineOptions* method.
> - *#setDefaultPipelineOptions* is called either when deserializaing 
> SerializablePipelineOptions or during opening of various beam operators. 
> - *FileIO#matchAll* is expanded using *Reshuffle.viaRandomKey()*.
> - Reshuffle is implemented using *.rebalance*, that doesn't have a 
> "RichFunction" lifecycle, so we need to find another way to register 
> FileSystems, as the deserialization may happen before other "rich operators" 
> get executed on particular task manager.
> This results in random pipeline fails as the task assignment is not 
> deterministic.
> We can workaround this, by registering FileSystems during coder 
> deserialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to