[ 
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301919#comment-14301919
 ] 

Alex Levenson commented on PARQUET-36:
--------------------------------------

Yes, there's a mailing list listed here: 
http://parquet.incubator.apache.org/community/

It might make sense to discuss here but if you want to send out an email 
referencing this jira you could do that.

As you said, the big win is probably from filtering entire row groups. We 
didn't get a chance to implement that (because it involves finding the 
dictionaries for each row group which are not as easy to grab as the 
statistics), and so I didn't add support for this at the value level because of 
that. But you have a good point -- we could still do this optimization at the 
value level (avoid decoding strings and such) even if we aren't doing it at the 
row group level yet.

Your approach looks reasonable -- you might even use a BitSet instead of a 
boolean[]

> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
>                 Key: PARQUET-36
>                 URL: https://issues.apache.org/jira/browse/PARQUET-36
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>              Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then 
> FilteringPrimitiveConverter should too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to