[ 
https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588122#comment-14588122
 ] 

Dong Chen commented on PARQUET-299:
-----------------------------------

Thanks [~nezihyigitbasi], I will take a try on Hive side to handle this.

If the rows size of vector depends on the data pages and is not constant, I get 
a question about the array {{values}} in {{ColumnVector}}. For example, the 
{{int[] values}} in {{IntColumnVector}}, is initialized with default size 1K. I 
guessed this array is designed to store decoded values if eager decoding. Since 
the actual rows numbers is always a litter bigger, how will we handle this? 
Resize it to {{ColumnVector.numValues}}, or other thoughts?



> [Vectorized Reader] ColumnVector length should be in terms of rows, not 
> DataPages
> ---------------------------------------------------------------------------------
>
>                 Key: PARQUET-299
>                 URL: https://issues.apache.org/jira/browse/PARQUET-299
>             Project: Parquet
>          Issue Type: Sub-task
>          Components: parquet-mr
>            Reporter: Zhenxiao Luo
>
> In https://github.com/zhenxiao/incubator-parquet-mr/tree/vector
> ColumnVector length is in terms of DataPages, need to be in terms of rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to