[ 
https://issues.apache.org/jira/browse/ORC-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412852#comment-16412852
 ] 

Sandeep More commented on ORC-305:
----------------------------------

Looks like the streamFactory does not expose the getPhysicalWriter(), the 
[WriterContext|https://github.com/apache/orc/blob/ded204a4a10bfad1ed739fc98f612a41005640c5/java/core/src/java/org/apache/orc/impl/writer/WriterContext.java]
 interface for streamFactory, used by TreeWriterBase does not have the 
getPhysicalWriter() method. That method is exposed by a different 
WriterContext, [OrcFile.WriterContext 
|https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/OrcFile.java#L330]
 but it can't be accessed by TreeWriterBase class.

One way would be to add getWriter() method to 
[org.apache.orc.impl.writer.WriterContext|https://github.com/apache/orc/blob/ded204a4a10bfad1ed739fc98f612a41005640c5/java/core/src/java/org/apache/orc/impl/writer/WriterContext.java]
 class, so we could do 
 TreeWriterBase.streamFactory.getWriter() to get the PhysicalWriter. let me 
know your thoughts !

> Add column statistics for the size on disk
> ------------------------------------------
>
>                 Key: ORC-305
>                 URL: https://issues.apache.org/jira/browse/ORC-305
>             Project: ORC
>          Issue Type: Test
>            Reporter: Owen O'Malley
>            Assignee: Sandeep More
>            Priority: Major
>
> It would be great to have the size on disk of each column.
> You can generate this by adding up the sizes of the dictionary and data 
> streams.
> It is only relevant at the stripe and file level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to