[
https://issues.apache.org/jira/browse/IMPALA-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663912#comment-16663912
]
Lars Volker commented on IMPALA-7756:
-------------------------------------
It seems beneficial to me to handle corrupt data properly and not ignore it. I
think we should not have DCHECKs that can be triggered by input data at all. We
can annotate any check for corrupt data as {{UNLIKELY(if...)}} to help the
compiler generate efficient code. Would that not work here?
> Create a "strict mode" for file parsing in debug builds
> -------------------------------------------------------
>
> Key: IMPALA-7756
> URL: https://issues.apache.org/jira/browse/IMPALA-7756
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Infrastructure
> Reporter: Csaba Ringhofer
> Priority: Minor
> Labels: parquet
>
> Errors in data files can be handled in several ways:
> a. the error can lead to a non-ok STATUS returned to the user
> b. some errors are ignored intentionally to support files written by buggy
> writers from the past or to avoid the performance cost of checking /
> complexity of bubbling up the status
> c. some errors lead to hitting a DCHECK - this means that it will be ignored
> in release builds, but will cause crash in debug builds
> c. is problematic because fuzz tests can generate such corrupt files, leading
> to crashed (and flaky) tests. On the other side, ignoring errors can mean
> that bugs in writers are not detected by Impala's tests.
> c. could be replaced with DCHECK like macros that are switched off if a flag
> (e.g. 'strict_parsers') is false. The flag would have no effect in release
> builds, and would be true by default during tests, while some specific tests
> (fuzz tests + parser tests for corrupt files) would start impalads with the
> flag turned off.
> The drawback would be that related tests would have to be custom cluster
> tests, making the test suite slower. The main benefit is that it is easy to
> add such checks to the code and they can provide guardrails during file
> writer development.
> An alternative approach would be to implement this as a query option, which
> would make tests simpler and faster, but would make the backend changes less
> trivial.
> There are already macros that seem to have the same purpose (e.g.
> FILE_CHECK_GE),, but they are implemented as simple DCHECKs (see
> https://github.com/apache/impala/blob/15e8ce4f273945ce548fe677ee0140dea8068e6d/be/src/util/debug-util.h#L153I)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]