Zoltán Borók-Nagy created IMPALA-9723:
-----------------------------------------

             Summary: Read files created by Hive Streaming Ingestion V2
                 Key: IMPALA-9723
                 URL: https://issues.apache.org/jira/browse/IMPALA-9723
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Zoltán Borók-Nagy


Impala should be able to read files created by Hive Streaming Ingestion V2.

Hive Streaming only writes full ACID ORC files. Such files might contain row 
stripes that Impala shouldn't read based on its validWriteIdList.

Also, Hive Streaming might append to the end of such files. In that case it 
writes a "side file" next to the file that contains the last committed file end 
(name of it is file name + ___flush_length). Impala should take that into 
consideration when it reads such files. Everything after "flush length" must be 
ignored.

OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to determine 
the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to