lostluck commented on issue #28981:
URL: https://github.com/apache/beam/issues/28981#issuecomment-1785576454

   @johannaojeling 
   
   >  It makes a single initial split of the restriction but then does a lot of 
splitting while processing. 
   
   That's probably from prism being unoptimized for splits. Turns out good 
splitting heuristics are hard!
   
   Right now, basically just an SDF will cause splitting, since prism only 
splits if the channel progress indication hasn't moved since the last progress 
request (every ~100ms) 
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L150
   
   Splitting to a stand still is definitely a bug. In principle it should 
estimate the amount of work remaining (by using the sizes and similar), and 
only split if there's a "reasonable" amount of work to split across.
   
   -------
   
   On the function execution portions...
   
   That's very odd. I can't think of why it would be the case though. Worth 
investigating and resolving, since the SDK failure to look up the function 
shouldn't cause a pipeline to hang.
   
   -------
   
   For sure, closures simply cannot work, since the external to the function 
variables they refer to aren't going to be initialized.  If I had the time and 
energy, I'd figure out how to improve Go "reflect" to get at closured data in 
anonymous functions, and be able to rehydrate them. But it's such a niche 
concern, it's uncertain how generally useful it would be. It just would have to 
be in reflect.
   
   
   Anonymous functions *can* work unregistered, but only because of a very 
sneaky, janky trick we do deep in the bowels of the system, where we load up 
the symbol from the DWARF debug data in the binary, and scan them looking for 
names we've skimmed from before:
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/symtab/symtab.go
   
   In combination with this: 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/reflectx/functions.go#L25
   
   To get the DWARF table, one needs to do a scan of the binary, which can be 
arbitrarily large. (Usually not bad for most Go, but if the binary has piles of 
CGO in there... it's big).
   
   As with any unsafe technique, they don't work consistently, are very likely 
to break as the internals of Go shift around (as with anything unsafe) etc.  
eg. Tests basically require registration, and was the original use case.
   
   Ultimately, because it only works sometimes it's a frustrating experience to 
have it shown off, only to not work. Hence my bias to recommending what does 
work, along with the most efficient way to do it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to