Ya fair enough, makes sense. I’ll reach out to GCP. Thanks Luke! - Evan
On Fri, Jul 8, 2022 at 11:24 Luke Cwik <[email protected]> wrote: > I was suggesting GCP support mainly because I don't think you want to > share the 2.36 and 2.40 version of your job file publicly as someone > familiar with the layout and format may spot a meaningful difference. > > Also, if it turns out that there is no meaningful difference between the > two then the internal mechanics of how the graph is modified by Dataflow is > not surfaced back to you in enough depth to debug further. > > > > On Fri, Jul 8, 2022 at 6:12 AM Evan Galpin <[email protected]> wrote: > >> Thanks for your response Luke :-) >> >> Updating in 2.36.0 works as expected, but as you alluded to I'm >> attempting to update to the latest SDK; in this case there are no code >> changes in the user code, only the SDK version. Is GCP support the only >> tool when it comes to deciphering the steps added by Dataflow? I would >> love to be able to inspect the complete graph with those extra steps like >> "Unzipped-2/FlattenReplace" that aren't in the job file. >> >> Thanks, >> Evan >> >> On Wed, Jul 6, 2022 at 4:21 PM Luke Cwik via user <[email protected]> >> wrote: >> >>> Does doing a pipeline update in 2.36 work or do you want to do an update >>> to get the latest version? >>> >>> Feel free to share the job files with GCP support. It could be something >>> internal but the coders for ephemeral steps that Dataflow adds are based >>> upon existing coders within the graph. >>> >>> On Tue, Jul 5, 2022 at 8:03 AM Evan Galpin <[email protected]> wrote: >>> >>>> +dev@ >>>> >>>> Reviving this thread as it has hit me again on Dataflow. I am trying >>>> to upgrade an active streaming pipeline from 2.36.0 to 2.40.0. Originally, >>>> I received an error that the step "Flatten.pCollections" was missing from >>>> the new job graph. I knew from the code that that wasn't true, so I dumped >>>> the job file via "--dataflowJobFile" for both the running pipeline and for >>>> the new version I'm attempting to update to. Both job files showed >>>> identical data for the Flatten.pCollections step, which raises the question >>>> of why that would have been reported as missing. >>>> >>>> Out of curiosity I then tried mapping the step to the same name, which >>>> changed the error to: "The Coder or type for step >>>> Flatten.pCollections/Unzipped-2/FlattenReplace has changed." Again, the >>>> job files show identical coders for the Flatten step (though >>>> "Unzipped-2/FlattenReplace" is not present in the job file, maybe an >>>> internal Dataflow thing?), so I'm confident that the coder hasn't actually >>>> changed. >>>> >>>> I'm not sure how to proceed in updating the running pipeline, and I'd >>>> really prefer not to drain. Any ideas? >>>> >>>> Thanks, >>>> Evan >>>> >>>> >>>> On Fri, Oct 22, 2021 at 3:36 PM Evan Galpin <[email protected]> >>>> wrote: >>>> >>>>> Thanks for the ideas Luke. I checked out the json graphs as per your >>>>> recommendation (thanks for that, was previously unaware), and the >>>>> "output_info" was identical for both the running pipeline and the pipeline >>>>> I was hoping to update it with. I ended up opting to just drain and >>>>> submit >>>>> the updated pipeline as a new job. Thanks for the tips! >>>>> >>>>> Thanks, >>>>> Evan >>>>> >>>>> On Thu, Oct 21, 2021 at 7:02 PM Luke Cwik <[email protected]> wrote: >>>>> >>>>>> I would suggest dumping the JSON representation (with the >>>>>> --dataflowJobFile=/path/to/output.json) of the pipeline before and after >>>>>> and looking to see what is being submitted to Dataflow. Dataflow's JSON >>>>>> graph representation is a bipartite graph where there are transform nodes >>>>>> with inputs and outputs and PCollection nodes with no inputs or outputs. >>>>>> The PCollection nodes typically end with the suffix ".out". This could >>>>>> help >>>>>> find steps that have been added/removed/renamed. >>>>>> >>>>>> The PipelineDotRenderer[1] might be of use as well. >>>>>> >>>>>> 1: >>>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PipelineDotRenderer.java >>>>>> >>>>>> On Thu, Oct 21, 2021 at 11:54 AM Evan Galpin <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm looking for any help regarding updating streaming jobs which are >>>>>>> already running on Dataflow. Specifically I'm seeking guidance for >>>>>>> situations where Fusion is involved, and trying to decipher which old >>>>>>> steps >>>>>>> should be mapped to which new steps. >>>>>>> >>>>>>> I have a case where I updated the steps which come after the step in >>>>>>> question, but when I attempt to update there is an error that "<old >>>>>>> step> >>>>>>> no longer produces data to the steps <downstream step>". I believe that >>>>>>> <old step> is only changed as a result of fusion, and in reality it >>>>>>> does in >>>>>>> fact produce data to <downstream step> (confirmed when deployed as a new >>>>>>> job for testing purposes). >>>>>>> >>>>>>> Is there a guide for how to deal with updates and fusion? >>>>>>> >>>>>>> Thanks, >>>>>>> Evan >>>>>>> >>>>>>
