This is not easy to do right now while the file is being read (rather, ex-post), but you are welcome to look at extending the Parquet read API to support selecting a particular row subset.
- Wes On Tue, Jul 25, 2017 at 4:10 PM, Katelman, Michael <[email protected]> wrote: > Hi, > > Is there anything in parquet right now that would allow me to efficiently > subselect a set of rows from a file given a list of integer row indices? In > my particular case, I'm only interested in flat tables and not additional > hierarchy. Primarily what I would like to avoid is holding a large number of > rows in memory only to immediately discard them; and, also, for very sparse > subsets of rows avoid reading/decompressing row groups without any of the > rows I need. > > -Mike > > > > > > DISCLAIMER: This e-mail message and any attachments are intended solely for > the use of the individual or entity to which it is addressed and may contain > information that is confidential or legally privileged. If you are not the > intended recipient, you are hereby notified that any dissemination, > distribution, copying or other use of this message or its attachments is > strictly prohibited. If you have received this message in error, please > notify the sender immediately and permanently delete this message and any > attachments. > > >
