[
https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-9175:
----------------------------------
Component/s: Backend
> Revisit the error handling logics in ORC scanner
> ------------------------------------------------
>
> Key: IMPALA-9175
> URL: https://issues.apache.org/jira/browse/IMPALA-9175
> Project: IMPALA
> Issue Type: Task
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Norbert Luksa
> Priority: Major
>
> This is a task to revisit all the corresponding error handling logics in the
> ORC scanner comparing to the Parquet scanner. For each kind of error handling
> in the parquet scanner, make sure we already handle it in the orc scanner,
> otherwise create separate JIRAs to handle them.
> Also need to make sure whether the exposed error messages are enough for
> debugging. For instance, one frequently encountered error when Impala has
> stale metadata of an ORC file is:
> {code:java}
> Encountered parse error in tail of ORC file
> hdfs://hadoop2cluster/user/hive-0.13.1/warehouse/bi_ucar.db/alliance_driver_stat_hour_api/dt=2019-08-09/part-00006:
> Invalid ORC postscript length
> {code}
> It'd be better to also print the postscript length we read and the file size.
> So users can know whether the file is corrupt (so need data regeneration) or
> the metadata is stale (so need refresh). We may need changes in the ORC lib
> for these.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]