damccorm opened a new issue, #21239:
URL: https://github.com/apache/beam/issues/21239

   The DataFrame API aims to catch errors in pipeline code at pipeline 
construction time as much as possible. Ideally, flawed user code will be caught 
during proxy generation and bubble up an error from pandas.
   
   However, there are edge cases where DataFrame operations validate at 
construction time, but  still produce errors at execution time, based on the 
actual data. For example a user might try to use the modulo operator on a 
string  column. This is a valid operation, but it performs string 
interpolation, not a modulo as the user intended.
   
   The above situation will raise an error at execution time, but it has a very 
obtuse stacktrace, with a tree of evaluate/evaluate_at calls. The culprit from 
the user's code is nowhere in the astacktrace.
   
   We should catch errors like this at execution time and add a pointer to the 
line in the user's
   code that created this expression. We'll likely need to add this metadata to 
DataFrame expressions.
   
   
   Imported from Jira 
[BEAM-13266](https://issues.apache.org/jira/browse/BEAM-13266). Original Jira 
may contain additional context.
   Reported by: bhulette.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to