[ 
https://issues.apache.org/jira/browse/PARQUET-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114720#comment-14114720
 ] 

Ian Barfield commented on PARQUET-71:
-------------------------------------

That doesn't sound consistent with the large differences I saw while stepping 
through the code to observe the actual byte counts. I'll have to take another 
look and get back to you.

> column chunk page write store log message displays incorrect information
> ------------------------------------------------------------------------
>
>                 Key: PARQUET-71
>                 URL: https://issues.apache.org/jira/browse/PARQUET-71
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ian Barfield
>            Priority: Minor
>
> It is printing the size of the dictionary (in terms of the number of keys) 
> twice and calling the second time the 'compressed byte count'. An accurate 
> account of that number would be very helpful for accounting for disk space 
> usage. The actual 'compressed byte count' is indeed calculated at a point 
> near there so I am guessing this is a simple mistake.
> see:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java#L152



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to