[
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley updated HIVE-9660:
--------------------------------
Attachment: HIVE-9660.patch
This patch does:
* implements a PositionedOutputStream.Callback to track when compression blocks
and RLE are finished.
* Adds lengths to the OrcProto.RowIndexEntry.
* Uses the lengths when determining the number of bytes to read when doing
predicate push down.
* Creates a callback for RowIndexEntry in the WriterImpl such that the entry
isn't finalized until all of the streams do their callback. To ensure that the
entry isn't finalized before all of the streams are added there is an
activation after the last stream has been added to the RowIndexEntry.
* Removing the positions and lengths from the RowIndexEntry for ispresent
stream removal is done softly so that remaining callbacks don't get impacted.
* The code dealing with the string columns and the dictionary vs direct
encoding has been significantly cleaned up.
* TreeWriter.writeStripe has been split into a flush method that will finalize
all of the streams.
* Lots of test case updates for the changes ORC file sizes.
* A new test case that tests the callbacks.
> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch,
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch,
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch,
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch,
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch,
> HIVE-9660.patch, owen-hive-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of
> compressed buffers for each RG, or end offset, or something, to remove this
> estimation magic
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)