[ 
https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258117#comment-17258117
 ] 

Zoltán Borók-Nagy commented on IMPALA-9723:
-------------------------------------------

Lowered the priority because AFAIK the current engines don't append to existing 
files, but create new ones. So the problem in the description is likely 
non-existent. But keeping this jira open until this behavior will be the 
standard.

> Read files created by Hive Streaming Ingestion V2
> -------------------------------------------------
>
>                 Key: IMPALA-9723
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9723
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Minor
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row 
> stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it 
> writes a "side file" next to the file that contains the last committed file 
> end (name of it is file name + _flush_length). Impala should take that into 
> consideration when it reads such files. Everything after "flush length" must 
> be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to 
> determine the committed file size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to