Dictionary filter broken with compression

Robert Kruszewski Tue, 04 Oct 2016 13:24:06 -0700

This actually is a lot simpler and turns out that dictionary filter is broken 
when compression is enabled.


I think Pat’s change sounds like a good fix if we really want to get release 
out. Otherwise we probably should refactor the code to not pass BytesInput 
around as the code comment suggests. 

-          Robert

On 10/4/16, 7:58 PM, "Patrick Woody" <[email protected]> wrote:

    Looking a bit more - it looks like this is because decompression converts
    to a StreamBytesInput automatically. The current tests run with the
    uncompressed codec, so it doesn't hit this issue. I've put up a commit here
    that demonstrates the issue and my current workaround:
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_palantir_parquet-2Dmr_pull_10_commits_70cc00cba5c294d4c860bd4cd2c48c2d083a5809&d=DQIBaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=Gukiqwaa9M7VDsJzd0J3W7mh_DfC1XlLxRRhg4t2Xyc&m=MSfZw5Y5MUMata_hsxWYGnz_CIrLv4WUK6qmRXNBwOk&s=7L03R22O3zRmlOpvjZF-sX0Qny7cJjxPrl3RM-GuMcg&e=
 
    .
    
    Thanks,
    Patrick
    
    On Tue, Oct 4, 2016 at 4:33 PM, Patrick Woody <[email protected]>
    wrote:
    
    > Hey all,
    >
    > Running a parquet-mr build off of master and I'm seeing some interesting
    > behavior when using a DictionaryFilter to prune row groups. Basically, if 
I
    > have an And or Or filter the DictionaryPage object gets re-used. This 
seems
    > to be a problem for StreamBytesInput because the stream gets exhausted
    > after the first toByteArray call. My current workaround is to synchronize
    > and just re-use the byte array after the first read, but I'd be curious as
    > to what people think the best approach to solving this is and if we should
    > be reusing the BytesInput at all.
    >
    > Best,
    > Patrick
    >

smime.p7s
Description: S/MIME cryptographic signature

Dictionary filter broken with compression

Reply via email to