chrisqiqiu commented on PR #38429:
URL: https://github.com/apache/beam/pull/38429#issuecomment-4886486292

   Thanks for the feedback — the per-element cost is a fair point. Here's how 
I'd address it:
   
   1. Skip DoFns that use "yield".
   A process method that uses "yield" is a generator. calling it returns a 
generator object, which can never be a str/bytes/dict. So the bug is impossible 
for these. I can detect this at construction time with 
inspect.isgeneratorfunction() and skip the check entirely. Since "yield" is the 
common style, this removes the cost for most DoFns.
   
   2. For DoFns that use "return", only check the first element.
   As @jrmccluskey pointed out, this bug is deterministic. if a DoFn returns a 
str once, it does so every time, so it's caught on the first element. I can 
check once, then set a flag to skip all later elements. That makes it a 
one-time check per DoFn instead of per element.
   
   I looked at doing this fully at construction time, but the bug depends on 
the actual value returned by "return" at runtime, so it can't be detected 
statically without relying more on trivial inference (which @jrmccluskey wanted 
to avoid).
   
   Does this approach sound good? If so, I'll update the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to