[
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267179#comment-15267179
]
Sergey Shelukhin commented on HIVE-9660:
----------------------------------------
{noformat}
The run length encoder doesn't perform the callback, but when its RLE block is
finished passes the same callback to the OutStream for when the OutStream
finishes the next compression block. Thus it is easy to guarantee that you only
get called back when compression block finishes after the RLE finishes, which
is the required condition. Obviously, for cases where there isn't an RLE, it
just puts the callback directly on the OutStream and it works exactly the same
way.
{noformat}
RG can have several RLE blocks; RLE block can contain several RGs. Moreover, in
case of a boolean writer, there are two levels of buffering - the byte, and the
RLE buffer in the underlying byte writer.
There's also the issue of dictionaries and strings, where isPresent is written
normally but the entries cannot be finalized.
In general, I feel like all the coordination complexity will still be
necessary, it would just end up moving around a bit.
> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch,
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch,
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch,
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch,
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of
> compressed buffers for each RG, or end offset, or something, to remove this
> estimation magic
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)