[ 
https://issues.apache.org/jira/browse/IMPALA-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Tran updated IMPALA-7178:
---------------------------------
    Labels: supportability  (was: )

> Reduce logging for common data errors
> -------------------------------------
>
>                 Key: IMPALA-7178
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7178
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: supportability
>
> Some data errors (for example out-of-range parquet timestamps) can dominate 
> logs if a table contains a large number of rows with invalid data. If an 
> error has its own error code (see common/thrift/generate_error_codes.py), 
> then these errors are already aggregated to the user 
> (RuntimeState::LogError()) for every query, but the logs will contain a new 
> line for every occurrence. This is not too useful most of times, as the log 
> lines will repeat the same information (the corrupt data itself is not logged 
> as it can be sensitive information).
> The best would to reduce logging without loosing information:
> - the first occurrence of an error should be logged (per 
> query/fragment/table/file/column) to help the investigation of cases where 
> the data error leads to other errors and to avoid breaking log analyzer tools 
> that search for the current format
> - other occurrences can be aggregated, like "in query Q table T column C XY 
> error occurred N times"
> An extra goal is to avoid calling RuntimeState::LogError() for other 
> occurrences than the first one, as RuntimeState::LogError() uses a (per 
> fragment) lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to