[
https://issues.apache.org/jira/browse/HIVE-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161967#comment-17161967
]
Peter Vary commented on HIVE-23883:
-----------------------------------
Basically the issue here is that we the FS is CheckSumFS, and flushing the file
there will not flush the data if the checksum buffer size is not reached.
We need to hack around this:
* Decrease the checksum buffer for the side file
* Turn off checksumming for the side file
* Fill up the side file until the checksum buffer size is reached }:-)
> Streaming does not flush the side file
> --------------------------------------
>
> Key: HIVE-23883
> URL: https://issues.apache.org/jira/browse/HIVE-23883
> Project: Hive
> Issue Type: Bug
> Components: Streaming, Transactions
> Reporter: Peter Vary
> Priority: Major
>
> When a streaming write commits a mid-batch write with
> {{connection.commitTransaction()}} then it tries to flush the sideFile with
> {{OrcInputFormat.SHIMS.hflush(flushLengths)}}. This uses
> FSOutputSummer.flush, which does not flush the buffer data to the disk so the
> actual data is not written.
> Had to remove the check from the end of the streaming tests in
> {{TestCrudCompactorOnTez.java}}
> {code:java}
> CompactorTestUtilities.checkAcidVersion(fs.listFiles(new
> Path(table.getSd().getLocation()), true), fs,
> conf.getBoolVar(HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE),
> new String[] { AcidUtils.DELTA_PREFIX });
> {code}
> These checks verifies the {{_flush_length}} files, and they would fail
> otherwise.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)