[
https://issues.apache.org/jira/browse/ORC-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798946#comment-15798946
]
Owen O'Malley commented on ORC-119:
-----------------------------------
[~sershe] Does the pull request meet your needs?
Notes:
* Moved the OutputReceiver interface to PhysicalWriter.
* Added suppress method to OutputReceiver.
* I've put PhysicalWriter in the public org.apache.orc package.
* Since the WriterImpl isn't tracking the streams any more, the
memoryEstimation moved into the TreeWriters.
* Because compression is being handled by the WriterImpl, the indexes don't
need to be modified. I merged them into using createDataStream.
* The StringBaseTreeWriter was creating duplicate streams for the data and
length, which ended up throwing off the size of the length stream when using
the direct encoding.
On a side note, putting memoryEstimation in the TreeWriters (and thus the
IntegerWriters) made it clear that RunLengthIntegerWriterV2 is using a lot of
memory that isn't being counted (4 * 512 * 8 = 16k per a rle encoder). I
originally fixed it in this patch, but it significantly threw off the stripe
sizes in the tests, so I left it off of the calculation.
> Create an abstraction named PhysicalWriter that abstracts where the Writer
> puts the bytes
> -----------------------------------------------------------------------------------------
>
> Key: ORC-119
> URL: https://issues.apache.org/jira/browse/ORC-119
> Project: Orc
> Issue Type: Bug
> Components: Java
> Reporter: Owen O'Malley
>
> This is a forward port of HIVE-14453, which introduce PhysicalWriter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)