[ 
https://issues.apache.org/jira/browse/IMPALA-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663810#comment-16663810
 ] 

Csaba Ringhofer commented on IMPALA-7756:
-----------------------------------------

[~tarmstrong]
This idea came up after the investigation of IMPALA-7471 - the DCHECK (actually 
a FILE_CHECK) that was hit has the exact goal off catching corrupt files like 
the one in the ticket, so removing the check does not seem like a valid fix. 
Replacing it with returning an error also doesn't seem a 100% good idea, as the 
code is run for every row in collection columns, so it could affect performance.

> Create a "strict mode" for file parsing in debug builds
> -------------------------------------------------------
>
>                 Key: IMPALA-7756
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7756
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Infrastructure
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>              Labels: parquet
>
> Errors in data files can be handled in several ways:
> a. the error can lead to a non-ok STATUS returned to the user
> b. some errors are ignored intentionally to support files written by buggy 
> writers from the past or to avoid the performance cost of checking / 
> complexity of bubbling up the status
> c. some errors lead to hitting a DCHECK - this means that it will be ignored 
> in release builds, but will cause crash in debug builds
> c. is problematic because fuzz tests can generate such corrupt files, leading 
> to crashed (and flaky) tests. On the other side, ignoring errors can mean 
> that bugs in writers are not detected by Impala's tests.
> c. could be replaced with DCHECK like macros  that are switched off if a flag 
> (e.g. 'strict_parsers') is false. The flag would have no effect in release 
> builds, and would be true by default during tests, while some specific tests 
> (fuzz tests + parser tests for corrupt files) would start impalads with the 
> flag turned off.
> The drawback would be that related tests would have to be custom cluster 
> tests, making the test suite slower. The main benefit is that it is easy to 
> add such checks to the code and they can provide guardrails during file 
> writer development.
> An alternative approach would be to implement this as a query option, which 
> would make tests simpler and faster, but would make the backend changes less 
> trivial.
> There are already macros that seem to have the same purpose (e.g. 
> FILE_CHECK_GE),, but they are implemented as simple DCHECKs (see 
> https://github.com/apache/impala/blob/15e8ce4f273945ce548fe677ee0140dea8068e6d/be/src/util/debug-util.h#L153I)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to