[ 
https://issues.apache.org/jira/browse/ORC-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798946#comment-15798946
 ] 

Owen O'Malley commented on ORC-119:
-----------------------------------

[~sershe] Does the pull request meet your needs?

Notes:
  * Moved the OutputReceiver interface to PhysicalWriter.
  * Added suppress method to OutputReceiver.
  * I've put PhysicalWriter in the public org.apache.orc package.
  * Since the WriterImpl isn't tracking the streams any more, the 
memoryEstimation moved into the TreeWriters.
  * Because compression is being handled by the WriterImpl, the indexes don't 
need to be modified. I merged them into using createDataStream.
  * The StringBaseTreeWriter was creating duplicate streams for the data and 
length, which ended up throwing off the size of the length stream when using 
the direct encoding. 

On a side note, putting memoryEstimation in the TreeWriters (and thus the 
IntegerWriters) made it clear that RunLengthIntegerWriterV2 is using a lot of 
memory that isn't being counted (4 * 512 * 8 = 16k per a rle encoder). I 
originally fixed it in this patch, but it significantly threw off the stripe 
sizes in the tests, so I left it off of the calculation.

> Create an abstraction named PhysicalWriter that abstracts where the Writer 
> puts the bytes
> -----------------------------------------------------------------------------------------
>
>                 Key: ORC-119
>                 URL: https://issues.apache.org/jira/browse/ORC-119
>             Project: Orc
>          Issue Type: Bug
>          Components: Java
>            Reporter: Owen O'Malley
>
> This is a forward port of HIVE-14453, which introduce PhysicalWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to