[
https://issues.apache.org/jira/browse/BEAM-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brian Hulette updated BEAM-13266:
---------------------------------
Status: Open (was: Triage Needed)
> DataFrame API errors should identify culprit operation in user code
> -------------------------------------------------------------------
>
> Key: BEAM-13266
> URL: https://issues.apache.org/jira/browse/BEAM-13266
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Priority: P2
>
> The DataFrame API aims to catch errors in pipeline code at pipeline
> construction time as much as possible. Ideally, flawed user code will be
> caught during proxy generation and bubble up an error from pandas.
> However, there are edge cases where DataFrame operations validate at
> construction time, but still produce errors at execution time, based on the
> actual data. For example a user might try to use the modulo operator on a
> string column. This is a valid operation, but it performs string
> interpolation, not a modulo as the user intended.
> The above situation will raise an error at execution time, but it has a very
> obtuse stacktrace, with a tree of evaluate/evaluate_at calls. The culprit
> from the user's code is nowhere in the astacktrace.
> We should catch errors like this at execution time and add a pointer to the
> line in the user's
> code that created this expression. We'll likely need to add this metadata to
> DataFrame expressions.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)