Zoltán Borók-Nagy created IMPALA-9723:
-----------------------------------------
Summary: Read files created by Hive Streaming Ingestion V2
Key: IMPALA-9723
URL: https://issues.apache.org/jira/browse/IMPALA-9723
Project: IMPALA
Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy
Impala should be able to read files created by Hive Streaming Ingestion V2.
Hive Streaming only writes full ACID ORC files. Such files might contain row
stripes that Impala shouldn't read based on its validWriteIdList.
Also, Hive Streaming might append to the end of such files. In that case it
writes a "side file" next to the file that contains the last committed file end
(name of it is file name + ___flush_length). Impala should take that into
consideration when it reads such files. Everything after "flush length" must be
ignored.
OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine
the committed file size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]