[ 
https://issues.apache.org/jira/browse/BEAM-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-13266:
---------------------------------
    Status: Open  (was: Triage Needed)

> DataFrame API errors should identify culprit operation in user code
> -------------------------------------------------------------------
>
>                 Key: BEAM-13266
>                 URL: https://issues.apache.org/jira/browse/BEAM-13266
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe
>            Reporter: Brian Hulette
>            Priority: P2
>
> The DataFrame API aims to catch errors in pipeline code at pipeline 
> construction time as much as possible. Ideally, flawed user code will be 
> caught during proxy generation and bubble up an error from pandas.
> However, there are edge cases where DataFrame operations validate at 
> construction time, but  still produce errors at execution time, based on the 
> actual data. For example a user might try to use the modulo operator on a 
> string  column. This is a valid operation, but it performs string 
> interpolation, not a modulo as the user intended.
> The above situation will raise an error at execution time, but it has a very 
> obtuse stacktrace, with a tree of evaluate/evaluate_at calls. The culprit 
> from the user's code is nowhere in the astacktrace.
> We should catch errors like this at execution time and add a pointer to the 
> line in the user's
> code that created this expression. We'll likely need to add this metadata to 
> DataFrame expressions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to