[ 
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300176#comment-14300176
 ] 

MIchael Davies edited comment on PARQUET-36 at 2/2/15 10:22 AM:
----------------------------------------------------------------

I'm happy to start looking at this. Is there a dev forum that can be used for 
discussion - or is JIRA best?

I think the simplest way to support this would be to add methods to 
ValueInspector and autogenerated subclasses - something like:
{code}
class ValueInspector {
        ...
        private boolean[] dictionaryResults;
        
        /** Populates array of boolean based on whether values pass filter */
        void setDictionary(Dictionary dictionary) {
            dictionaryResults = fillInDictionaryResults(dictionary);
        }
        
        /** Update based on element from dictionary */
        void updateFromDictionary(int dictionaryId) {
            setResult(dictionaryResults[dictionaryId]);
        }
        
        /** Return an array of boolean that match the predicate for each value 
in the dictionary */
        abstract protected boolean[] fillInDictionaryResults(Dictionary 
dictionary);
}
{code}

It would be nice if dictionary based predicate evaluation could be used to skip 
entire rows where appropriate, but this is more work I think.



was (Author: michael davies):
I'm happy to start looking at this. I can let you know if I get anywhere.  Is 
there a dev forum that can be used for discussion - or is JIRA best?

I think the simplest way to support this would be to add methods to 
ValueInspector and autogenerated subclasses - something like:
{code}
class ValueInspector {
        ...
        private boolean[] dictionaryResults;
        
        /** Populates array of boolean based on whether values pass filter */
        void setDictionary(Dictionary dictionary) {
            dictionaryResults = fillInDictionaryResults(dictionary);
        }
        
        /** Update based on element from dictionary */
        void updateFromDictionary(int dictionaryId) {
            setResult(dictionaryResults[dictionaryId]);
        }
        
        /** Return an array of boolean that match the predicate for each value 
in the dictionary */
        abstract protected boolean[] fillInDictionaryResults(Dictionary 
dictionary);
}
{code}

It would be nice if dictionary based predicate evaluation could be used to skip 
entire rows where appropriate, but this is more work I think.


> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
>                 Key: PARQUET-36
>                 URL: https://issues.apache.org/jira/browse/PARQUET-36
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>              Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then 
> FilteringPrimitiveConverter should too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to