Ya fair enough, makes sense. I’ll reach out to GCP. Thanks Luke!

- Evan

On Fri, Jul 8, 2022 at 11:24 Luke Cwik <[email protected]> wrote:

> I was suggesting GCP support mainly because I don't think you want to
> share the 2.36 and 2.40 version of your job file publicly as someone
> familiar with the layout and format may spot a meaningful difference.
>
> Also, if it turns out that there is no meaningful difference between the
> two then the internal mechanics of how the graph is modified by Dataflow is
> not surfaced back to you in enough depth to debug further.
>
>
>
> On Fri, Jul 8, 2022 at 6:12 AM Evan Galpin <[email protected]> wrote:
>
>> Thanks for your response Luke :-)
>>
>> Updating in 2.36.0 works as expected, but as you alluded to I'm
>> attempting to update to the latest SDK; in this case there are no code
>> changes in the user code, only the SDK version.  Is GCP support the only
>> tool when it comes to deciphering the steps added by Dataflow?  I would
>> love to be able to inspect the complete graph with those extra steps like
>> "Unzipped-2/FlattenReplace" that aren't in the job file.
>>
>> Thanks,
>> Evan
>>
>> On Wed, Jul 6, 2022 at 4:21 PM Luke Cwik via user <[email protected]>
>> wrote:
>>
>>> Does doing a pipeline update in 2.36 work or do you want to do an update
>>> to get the latest version?
>>>
>>> Feel free to share the job files with GCP support. It could be something
>>> internal but the coders for ephemeral steps that Dataflow adds are based
>>> upon existing coders within the graph.
>>>
>>> On Tue, Jul 5, 2022 at 8:03 AM Evan Galpin <[email protected]> wrote:
>>>
>>>> +dev@
>>>>
>>>> Reviving this thread as it has hit me again on Dataflow.  I am trying
>>>> to upgrade an active streaming pipeline from 2.36.0 to 2.40.0.  Originally,
>>>> I received an error that the step "Flatten.pCollections" was missing from
>>>> the new job graph.  I knew from the code that that wasn't true, so I dumped
>>>> the job file via "--dataflowJobFile" for both the running pipeline and for
>>>> the new version I'm attempting to update to.  Both job files showed
>>>> identical data for the Flatten.pCollections step, which raises the question
>>>> of why that would have been reported as missing.
>>>>
>>>> Out of curiosity I then tried mapping the step to the same name, which
>>>> changed the error to:  "The Coder or type for step
>>>> Flatten.pCollections/Unzipped-2/FlattenReplace has changed."  Again, the
>>>> job files show identical coders for the Flatten step (though
>>>> "Unzipped-2/FlattenReplace" is not present in the job file, maybe an
>>>> internal Dataflow thing?), so I'm confident that the coder hasn't actually
>>>> changed.
>>>>
>>>> I'm not sure how to proceed in updating the running pipeline, and I'd
>>>> really prefer not to drain.  Any ideas?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>
>>>> On Fri, Oct 22, 2021 at 3:36 PM Evan Galpin <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the ideas Luke. I checked out the json graphs as per your
>>>>> recommendation (thanks for that, was previously unaware), and the
>>>>> "output_info" was identical for both the running pipeline and the pipeline
>>>>> I was hoping to update it with.  I ended up opting to just drain and 
>>>>> submit
>>>>> the updated pipeline as a new job.  Thanks for the tips!
>>>>>
>>>>> Thanks,
>>>>> Evan
>>>>>
>>>>> On Thu, Oct 21, 2021 at 7:02 PM Luke Cwik <[email protected]> wrote:
>>>>>
>>>>>> I would suggest dumping the JSON representation (with the
>>>>>> --dataflowJobFile=/path/to/output.json) of the pipeline before and after
>>>>>> and looking to see what is being submitted to Dataflow. Dataflow's JSON
>>>>>> graph representation is a bipartite graph where there are transform nodes
>>>>>> with inputs and outputs and PCollection nodes with no inputs or outputs.
>>>>>> The PCollection nodes typically end with the suffix ".out". This could 
>>>>>> help
>>>>>> find steps that have been added/removed/renamed.
>>>>>>
>>>>>> The PipelineDotRenderer[1] might be of use as well.
>>>>>>
>>>>>> 1:
>>>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PipelineDotRenderer.java
>>>>>>
>>>>>> On Thu, Oct 21, 2021 at 11:54 AM Evan Galpin <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm looking for any help regarding updating streaming jobs which are
>>>>>>> already running on Dataflow.  Specifically I'm seeking guidance for
>>>>>>> situations where Fusion is involved, and trying to decipher which old 
>>>>>>> steps
>>>>>>> should be mapped to which new steps.
>>>>>>>
>>>>>>> I have a case where I updated the steps which come after the step in
>>>>>>> question, but when I attempt to update there is an error that "<old 
>>>>>>> step>
>>>>>>> no longer produces data to the steps <downstream step>". I believe that
>>>>>>> <old step> is only changed as a result of fusion, and in reality it 
>>>>>>> does in
>>>>>>> fact produce data to <downstream step> (confirmed when deployed as a new
>>>>>>> job for testing purposes).
>>>>>>>
>>>>>>> Is there a guide for how to deal with updates and fusion?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Evan
>>>>>>>
>>>>>>

Reply via email to