The existing "TypedColumnReader<DType>::Skip(int64_t num_rows_to_skip)"
method can be extended to avoid reading/decompressing.

On Wed, Jul 26, 2017 at 7:11 AM, Katelman, Michael <
[email protected]> wrote:

> Thanks, Wes.
>
> -Mike
>
> -----Original Message-----
> From: Wes McKinney [mailto:[email protected]]
> Sent: Tuesday, July 25, 2017 21:56
> To: [email protected]
> Subject: Re: subselecting rows
>
> This is not easy to do right now while the file is being read (rather,
> ex-post), but you are welcome to look at extending the Parquet read API to
> support selecting a particular row subset.
>
> - Wes
>
> On Tue, Jul 25, 2017 at 4:10 PM, Katelman, Michael <Michael.Katelman@
> cubistsystematic.com> wrote:
> > Hi,
> >
> > Is there anything in parquet right now that would allow me to
> efficiently subselect a set of rows from a file given a list of integer row
> indices? In my particular case, I'm only interested in flat tables and not
> additional hierarchy. Primarily what I would like to avoid is holding a
> large number of rows in memory only to immediately discard them; and, also,
> for very sparse subsets of rows avoid reading/decompressing row groups
> without any of the rows I need.
> >
> > -Mike
> >
> >
> >
> >
> >
> > DISCLAIMER: This e-mail message and any attachments are intended solely
> for the use of the individual or entity to which it is addressed and may
> contain information that is confidential or legally privileged. If you are
> not the intended recipient, you are hereby notified that any dissemination,
> distribution, copying or other use of this message or its attachments is
> strictly prohibited. If you have received this message in error, please
> notify the sender immediately and permanently delete this message and any
> attachments.
> >
> >
> >
>



-- 
regards,
Deepak Majeti

Reply via email to