Looking a bit more - it looks like this is because decompression converts
to a StreamBytesInput automatically. The current tests run with the
uncompressed codec, so it doesn't hit this issue. I've put up a commit here
that demonstrates the issue and my current workaround:
https://github.com/palantir/parquet-mr/pull/10/commits/70cc00cba5c294d4c860bd4cd2c48c2d083a5809
.

Thanks,
Patrick

On Tue, Oct 4, 2016 at 4:33 PM, Patrick Woody <[email protected]>
wrote:

> Hey all,
>
> Running a parquet-mr build off of master and I'm seeing some interesting
> behavior when using a DictionaryFilter to prune row groups. Basically, if I
> have an And or Or filter the DictionaryPage object gets re-used. This seems
> to be a problem for StreamBytesInput because the stream gets exhausted
> after the first toByteArray call. My current workaround is to synchronize
> and just re-use the byte array after the first read, but I'd be curious as
> to what people think the best approach to solving this is and if we should
> be reusing the BytesInput at all.
>
> Best,
> Patrick
>

Reply via email to