The existing "TypedColumnReader<DType>::Skip(int64_t num_rows_to_skip)" method can be extended to avoid reading/decompressing.
On Wed, Jul 26, 2017 at 7:11 AM, Katelman, Michael < [email protected]> wrote: > Thanks, Wes. > > -Mike > > -----Original Message----- > From: Wes McKinney [mailto:[email protected]] > Sent: Tuesday, July 25, 2017 21:56 > To: [email protected] > Subject: Re: subselecting rows > > This is not easy to do right now while the file is being read (rather, > ex-post), but you are welcome to look at extending the Parquet read API to > support selecting a particular row subset. > > - Wes > > On Tue, Jul 25, 2017 at 4:10 PM, Katelman, Michael <Michael.Katelman@ > cubistsystematic.com> wrote: > > Hi, > > > > Is there anything in parquet right now that would allow me to > efficiently subselect a set of rows from a file given a list of integer row > indices? In my particular case, I'm only interested in flat tables and not > additional hierarchy. Primarily what I would like to avoid is holding a > large number of rows in memory only to immediately discard them; and, also, > for very sparse subsets of rows avoid reading/decompressing row groups > without any of the rows I need. > > > > -Mike > > > > > > > > > > > > DISCLAIMER: This e-mail message and any attachments are intended solely > for the use of the individual or entity to which it is addressed and may > contain information that is confidential or legally privileged. If you are > not the intended recipient, you are hereby notified that any dissemination, > distribution, copying or other use of this message or its attachments is > strictly prohibited. If you have received this message in error, please > notify the sender immediately and permanently delete this message and any > attachments. > > > > > > > -- regards, Deepak Majeti
