lostluck commented on issue #28981: URL: https://github.com/apache/beam/issues/28981#issuecomment-1785576454
@johannaojeling > It makes a single initial split of the restriction but then does a lot of splitting while processing. That's probably from prism being unoptimized for splits. Turns out good splitting heuristics are hard! Right now, basically just an SDF will cause splitting, since prism only splits if the channel progress indication hasn't moved since the last progress request (every ~100ms) https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L150 Splitting to a stand still is definitely a bug. In principle it should estimate the amount of work remaining (by using the sizes and similar), and only split if there's a "reasonable" amount of work to split across. ------- On the function execution portions... That's very odd. I can't think of why it would be the case though. Worth investigating and resolving, since the SDK failure to look up the function shouldn't cause a pipeline to hang. ------- For sure, closures simply cannot work, since the external to the function variables they refer to aren't going to be initialized. If I had the time and energy, I'd figure out how to improve Go "reflect" to get at closured data in anonymous functions, and be able to rehydrate them. But it's such a niche concern, it's uncertain how generally useful it would be. It just would have to be in reflect. Anonymous functions *can* work unregistered, but only because of a very sneaky, janky trick we do deep in the bowels of the system, where we load up the symbol from the DWARF debug data in the binary, and scan them looking for names we've skimmed from before: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/symtab/symtab.go In combination with this: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/reflectx/functions.go#L25 To get the DWARF table, one needs to do a scan of the binary, which can be arbitrarily large. (Usually not bad for most Go, but if the binary has piles of CGO in there... it's big). As with any unsafe technique, they don't work consistently, are very likely to break as the internals of Go shift around (as with anything unsafe) etc. eg. Tests basically require registration, and was the original use case. Ultimately, because it only works sometimes it's a frustrating experience to have it shown off, only to not work. Hence my bias to recommending what does work, along with the most efficient way to do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
