[
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301919#comment-14301919
]
Alex Levenson commented on PARQUET-36:
--------------------------------------
Yes, there's a mailing list listed here:
http://parquet.incubator.apache.org/community/
It might make sense to discuss here but if you want to send out an email
referencing this jira you could do that.
As you said, the big win is probably from filtering entire row groups. We
didn't get a chance to implement that (because it involves finding the
dictionaries for each row group which are not as easy to grab as the
statistics), and so I didn't add support for this at the value level because of
that. But you have a good point -- we could still do this optimization at the
value level (avoid decoding strings and such) even if we aren't doing it at the
row group level yet.
Your approach looks reasonable -- you might even use a BitSet instead of a
boolean[]
> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
> Key: PARQUET-36
> URL: https://issues.apache.org/jira/browse/PARQUET-36
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Alex Levenson
> Priority: Minor
> Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then
> FilteringPrimitiveConverter should too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)