Re: StreamBytesInput with DictionaryFilter

Patrick Woody Fri, 07 Oct 2016 11:28:17 -0700

Hi Ryan,

Apologies for the delay! I've filed it here
https://issues.apache.org/jira/browse/PARQUET-743 with the information from
the thread.


Thanks
Patrick

On Wed, Oct 5, 2016 at 4:39 PM, Ryan Blue <[email protected]> wrote:

> Patrick,
>
> Can you please open an issue for this? I think we should fix this before
> the 1.9.0 release. Thanks!
>
> rb
>
> On Tue, Oct 4, 2016 at 11:58 AM, Patrick Woody <[email protected]>
> wrote:
>
> > Looking a bit more - it looks like this is because decompression converts
> > to a StreamBytesInput automatically. The current tests run with the
> > uncompressed codec, so it doesn't hit this issue. I've put up a commit
> here
> > that demonstrates the issue and my current workaround:
> > https://github.com/palantir/parquet-mr/pull/10/commits/
> > 70cc00cba5c294d4c860bd4cd2c48c2d083a5809
> > .
> >
> > Thanks,
> > Patrick
> >
> > On Tue, Oct 4, 2016 at 4:33 PM, Patrick Woody <[email protected]>
> > wrote:
> >
> > > Hey all,
> > >
> > > Running a parquet-mr build off of master and I'm seeing some
> interesting
> > > behavior when using a DictionaryFilter to prune row groups. Basically,
> > if I
> > > have an And or Or filter the DictionaryPage object gets re-used. This
> > seems
> > > to be a problem for StreamBytesInput because the stream gets exhausted
> > > after the first toByteArray call. My current workaround is to
> synchronize
> > > and just re-use the byte array after the first read, but I'd be curious
> > as
> > > to what people think the best approach to solving this is and if we
> > should
> > > be reusing the BytesInput at all.
> > >
> > > Best,
> > > Patrick
> > >
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: StreamBytesInput with DictionaryFilter

Reply via email to