[
https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy updated IMPALA-9723:
--------------------------------------
Description:
Impala should be able to read files created by Hive Streaming Ingestion V2.
Hive Streaming only writes full ACID ORC files. Such files might contain row
stripes that Impala shouldn't read based on its validWriteIdList.
Also, Hive Streaming might append to the end of such files. In that case it
writes a "side file" next to the file that contains the last committed file end
(name of it is file name + _flush_length). Impala should take that into
consideration when it reads such files. Everything after "flush length" must be
ignored.
OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine
the committed file size.
was:
Impala should be able to read files created by Hive Streaming Ingestion V2.
Hive Streaming only writes full ACID ORC files. Such files might contain row
stripes that Impala shouldn't read based on its validWriteIdList.
Also, Hive Streaming might append to the end of such files. In that case it
writes a "side file" next to the file that contains the last committed file end
(name of it is file name + ___flush_length). Impala should take that into
consideration when it reads such files. Everything after "flush length" must be
ignored.
OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine
the committed file size.
> Read files created by Hive Streaming Ingestion V2
> -------------------------------------------------
>
> Key: IMPALA-9723
> URL: https://issues.apache.org/jira/browse/IMPALA-9723
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Zoltán Borók-Nagy
> Priority: Major
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row
> stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it
> writes a "side file" next to the file that contains the last committed file
> end (name of it is file name + _flush_length). Impala should take that into
> consideration when it reads such files. Everything after "flush length" must
> be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to
> determine the committed file size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]