[ 
https://issues.apache.org/jira/browse/BEAM-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549665#comment-17549665
 ] 

Danny McCormick commented on BEAM-13266:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/21239

> DataFrame API errors should identify culprit operation in user code
> -------------------------------------------------------------------
>
>                 Key: BEAM-13266
>                 URL: https://issues.apache.org/jira/browse/BEAM-13266
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe
>            Reporter: Brian Hulette
>            Priority: P2
>
> The DataFrame API aims to catch errors in pipeline code at pipeline 
> construction time as much as possible. Ideally, flawed user code will be 
> caught during proxy generation and bubble up an error from pandas.
> However, there are edge cases where DataFrame operations validate at 
> construction time, but  still produce errors at execution time, based on the 
> actual data. For example a user might try to use the modulo operator on a 
> string  column. This is a valid operation, but it performs string 
> interpolation, not a modulo as the user intended.
> The above situation will raise an error at execution time, but it has a very 
> obtuse stacktrace, with a tree of evaluate/evaluate_at calls. The culprit 
> from the user's code is nowhere in the astacktrace.
> We should catch errors like this at execution time and add a pointer to the 
> line in the user's
> code that created this expression. We'll likely need to add this metadata to 
> DataFrame expressions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to