[
https://issues.apache.org/jira/browse/IMPALA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-11489.
-------------------------------------
Fix Version/s: Impala 4.2.0
Target Version: Impala 4.2.0
Resolution: Fixed
Thank [~csringhofer]! I'll also port this to 4.1.1
> Async IO cannot handle >2GB ORC files
> -------------------------------------
>
> Key: IMPALA-11489
> URL: https://issues.apache.org/jira/browse/IMPALA-11489
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Major
> Fix For: Impala 4.2.0
>
>
> We assume that the size fits to an int:
> https://github.com/apache/impala/blob/308fda110758b0fc58e5b1f477d635aac29aea75/be/src/exec/hdfs-orc-scanner.cc#L253
> If the size overflows, then we can incorrectly hit the following error check
> (this check is meant to avoid crashing due to corrupt metadata). I see no
> other ways this could cause problems, if the catch still succeeds (because
> the overflow led to a valid looking length), then the data will be read
> correctly.
> This looks like a trivial fix, but I am concerned about lack of testing of
> >2GB files
--
This message was sent by Atlassian Jira
(v8.20.10#820010)