[
https://issues.apache.org/jira/browse/IMPALA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258117#comment-17258117
]
Zoltán Borók-Nagy commented on IMPALA-9723:
-------------------------------------------
Lowered the priority because AFAIK the current engines don't append to existing
files, but create new ones. So the problem in the description is likely
non-existent. But keeping this jira open until this behavior will be the
standard.
> Read files created by Hive Streaming Ingestion V2
> -------------------------------------------------
>
> Key: IMPALA-9723
> URL: https://issues.apache.org/jira/browse/IMPALA-9723
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Zoltán Borók-Nagy
> Priority: Minor
>
> Impala should be able to read files created by Hive Streaming Ingestion V2.
> Hive Streaming only writes full ACID ORC files. Such files might contain row
> stripes that Impala shouldn't read based on its validWriteIdList.
> Also, Hive Streaming might append to the end of such files. In that case it
> writes a "side file" next to the file that contains the last committed file
> end (name of it is file name + _flush_length). Impala should take that into
> consideration when it reads such files. Everything after "flush length" must
> be ignored.
> OrcAcidUtils.getLastFlushLength(fileSystem, filePath) can be used to
> determine the committed file size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]