[ 
https://issues.apache.org/jira/browse/IMPALA-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663969#comment-16663969
 ] 

Tim Armstrong commented on IMPALA-7756:
---------------------------------------

Maybe it's cheap enough to add an UNLIKELY() branch at some of these places 
though. It looks like it's mostly in the nested types scanning code.

> Create a "strict mode" for file parsing in debug builds
> -------------------------------------------------------
>
>                 Key: IMPALA-7756
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7756
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Infrastructure
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>              Labels: parquet
>
> Errors in data files can be handled in several ways:
> a. the error can lead to a non-ok STATUS returned to the user
> b. some errors are ignored intentionally to support files written by buggy 
> writers from the past or to avoid the performance cost of checking / 
> complexity of bubbling up the status
> c. some errors lead to hitting a DCHECK - this means that it will be ignored 
> in release builds, but will cause crash in debug builds
> c. is problematic because fuzz tests can generate such corrupt files, leading 
> to crashed (and flaky) tests. On the other side, ignoring errors can mean 
> that bugs in writers are not detected by Impala's tests.
> c. could be replaced with DCHECK like macros  that are switched off if a flag 
> (e.g. 'strict_parsers') is false. The flag would have no effect in release 
> builds, and would be true by default during tests, while some specific tests 
> (fuzz tests + parser tests for corrupt files) would start impalads with the 
> flag turned off.
> The drawback would be that related tests would have to be custom cluster 
> tests, making the test suite slower. The main benefit is that it is easy to 
> add such checks to the code and they can provide guardrails during file 
> writer development.
> An alternative approach would be to implement this as a query option, which 
> would make tests simpler and faster, but would make the backend changes less 
> trivial.
> There are already macros that seem to have the same purpose (e.g. 
> FILE_CHECK_GE),, but they are implemented as simple DCHECKs (see 
> https://github.com/apache/impala/blob/15e8ce4f273945ce548fe677ee0140dea8068e6d/be/src/util/debug-util.h#L153I)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to