[ 
https://issues.apache.org/jira/browse/BEAM-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312793#comment-17312793
 ] 

Daniel Oliveira commented on BEAM-11574:
----------------------------------------

I'll use this Jira to write a summary of the steps that fixing this involved.

1. The original bug documented up top is fixed by enabling portable pipeline 
submission in Dataflow. This is a requirement to run cross-language pipelines, 
so the bug was caused because a cross-language proto pipeline was being 
submitted but getting executed without cross-language support. Enabling this 
took some trial and error, and ultimately required figuring out that portable 
pipelines required an additional field in job submission 
(SdkHarnessContainerImages) that took a list of environments, which 
necessitated adjusting the surrounding code to support multiple environments. 
It also required me adding a new flag that could provide container image 
overrides for multiple images, so that cross-language environments could also 
have custom images defined (necessary for testing at head).

2. After this I got errors where the job failed to start up properly and was 
hanging infinitely. The logs were very obtuse, but after getting some help from 
a domain expert in this we noticed the error was happening in a Warning log, 
describing being unable to read an empty environment. One of the environments 
in the list had an empty value. Using a debugger, I tracked this down to being 
due to the environment from the original expansion, which is a stub with no 
value assigned, still being preserved and merged in at the end, when expanding 
all cross-language transforms.

3. After doing a workaround to the issue above (by changing the stub 
environment into a full environment identical to Dataflow) I ran into another 
issue caused by an unidentified impulse transform that was causing the pipeline 
to fail. Turns out the step we do to add fake impulses was causing an error in 
Dataflow because the impulse wasn't getting properly removed. After stepping 
through the expansion service functionality in a debugger, I found that the 
fake impulses weren't even necessary anymore. After removing them, the 
pipelines started working.

4. I went back and did a proper fix to the stub environment issue. This proper 
fix was to stop namespacing environments before expansion. The later code where 
expanded transforms are merged into the pipeline assumes that the default 
environment is not namespaced, and skips it because it's already present in the 
proto. The bug was happening because this assumption wasn't true, so I made it 
true by not namespacing the default Go environment. Now the stub version of the 
default environment doesn't get merged into the final pipeline, it just gets 
skipped because there is already a default environment with the same name 
present.

> Enable x-lang on Dataflow side for integration tests
> ----------------------------------------------------
>
>                 Key: BEAM-11574
>                 URL: https://issues.apache.org/jira/browse/BEAM-11574
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language, sdk-go
>            Reporter: Daniel Oliveira
>            Assignee: Daniel Oliveira
>            Priority: P2
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow x-lang ValidatesRunner tests are failing with the following error:
> {noformat}
> panic:  unmarshalling coder UIbLsVLhrXKvCoder
>         unmarshalling coder UIbLsVLhrXVoidCoder
> could not unmarshal coder from spec:{urn:"beam:coders:javasdk:0.1" 
> payload:"\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x88\xd6\x01\xe8\xac\xed\x00\x05sr\x00$org.apache.beam.sdk.coders.VoidCoder\xb9\xbfU\x9b\xe8\r\xafU\x02\x00\x00xr\x00&j3\x00\x14Atomic\x055
>  
> \xc7\xec\xb5̅tPF\x02\x055\x00*j5\x00$Structured\x059\x1cs\xbf\x12\x0e\xd5\xd46\x11\t9\x00
>  j9\x00\x05/0C\xddՉ\xae\xbc~\xf8\x02\x00\x00xp"}, unknown URN 
> beam:coders:javasdk:0.1 [recovered]
>         panic:  unmarshalling coder UIbLsVLhrXKvCoder
>         unmarshalling coder UIbLsVLhrXVoidCoder
> could not unmarshal coder from spec:{urn:"beam:coders:javasdk:0.1" 
> payload:"\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x88\xd6\x01\xe8\xac\xed\x00\x05sr\x00$org.apache.beam.sdk.coders.VoidCoder\xb9\xbfU\x9b\xe8\r\xafU\x02\x00\x00xr\x00&j3\x00\x14Atomic\x055
>  
> \xc7\xec\xb5̅tPF\x02\x055\x00*j5\x00$Structured\x059\x1cs\xbf\x12\x0e\xd5\xd46\x11\t9\x00
>  j9\x00\x05/0C\xddՉ\xae\xbc~\xf8\x02\x00\x00xp"}, unknown URN 
> beam:coders:javasdk:0.1
> goroutine 130 [running]:
> testing.tRunner.func1.1(0xe2f1a0, 0xc00059ea80)
>         /usr/lib/google-golang/src/testing/testing.go:1072 +0x30d
> testing.tRunner.func1(0xc000290f00)
>         /usr/lib/google-golang/src/testing/testing.go:1075 +0x41a
> panic(0xe2f1a0, 0xc00059ea80)
>         /usr/lib/google-golang/src/runtime/panic.go:969 +0x1b9
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateCoder(0xc0002da260,
>  0xc0005b5b00, 0xc0003fdee0, 0x11, 0xc000379ac8)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:337
>  +0xa5
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateOutputs(0xc0002da260,
>  0xc00079b1d0, 0xc0003fd600, 0x11, 0xc000376180)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:317
>  +0x190
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260,
>  0xc00019e330, 0x2d, 0xc00078a660, 0x1b, 0xf15a80, 0xc000096000, 
> 0xc0004fb898, 0x6475eb, 0xc000096000)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:111
>  +0x111
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260,
>  0xc00019e330, 0x2d, 0xc000265580, 0x7, 0x8, 0x2d, 0x6, 0x0, 0xc00003f41a, 
> ...)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97
>  +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260,
>  0xc000280600, 0x1a, 0xc0001266d8, 0x2, 0xf15a80, 0xc000096000, 0xc0004fc710, 
> 0x6475eb, 0xc000096000)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:285
>  +0x36d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260,
>  0xc000280600, 0x1a, 0xc0002be740, 0x1, 0x1, 0x1a, 0x1, 0x0, 0x0, ...)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97
>  +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260,
>  0x0, 0x0, 0xc0001266d4, 0x2, 0x1, 0x1, 0x2, 0x0, 0x0)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:285
>  +0x36d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260,
>  0x0, 0x0, 0xc000180140, 0x5, 0x5, 0xc0000a15f8, 0x5b96a5, 0xc000102600, 
> 0x200000003, ...)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97
>  +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.translate(0xc000313800,
>  0x5d4fda, 0x1, 0x2, 0xc000778000, 0xa2)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:73
>  +0x77
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.Translate(0x103ff00,
>  0xc00019c628, 0xc000313800, 0xc0002bbcb0, 0xc00019b900, 0x77, 0xc00016e2d0, 
> 0x84, 0xc00019b880, 0x76, ...)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go:77
>  +0x45
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.Execute(0x103ff00,
>  0xc00019c628, 0xc000313800, 0xc0002bbcb0, 0xc00019b900, 0x77, 0xc00016e2d0, 
> 0x84, 0xc00019b880, 0x76, ...)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go:91
>  +0x699
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow.Execute(0x103ff00,
>  0xc00019c628, 0xc000124888, 0x8, 0xc000142158, 0xf6ae01, 0xc00032ba40)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflow.go:207
>  +0xe7d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam.Run(0x103ff00,
>  0xc00019c628, 0x7ffec804dad3, 0x8, 0xc000124888, 0x102d860, 0xc00032b980, 
> 0xc0002b7ee8, 0xba6e45)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runner.go:50
>  +0x87
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest.Run(0xc000124888,
>  0xc00032a660, 0xc00079be00)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest/ptest.go:89
>  +0x8b
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest.RunAndValidate(0xc000290f00,
>  0xc000124888)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest/ptest.go:96
>  +0x2f
> github.com/apache/beam/sdks/go/test/integration/xlang.TestXLang_CombineGlobally(0xc000290f00)
>         
> /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/integration/xlang/xlang_test.go:165
>  +0x249
> testing.tRunner(0xc000290f00, 0xf6a908)
>         /usr/lib/google-golang/src/testing/testing.go:1123 +0xef
> created by testing.(*T).Run
>         /usr/lib/google-golang/src/testing/testing.go:1168 +0x2b3
> {noformat}
> This seems to imply that the bundles intended to be sent to the Java SDK are 
> still being sent to the Go SDK (these are Go SDK errors), so it seems that 
> cross-language functionality still needs to be enabled for the integration 
> tests.
> Edit:
> Looking closer, the stacktrace actually suggests that the problem is that 
> Go's Dataflow translation doesn't properly support cross-language transforms. 
> The solution would be to adjust dataflowlib/translate.go to support 
> cross-language.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to