Will using a DataPage V2 or DataPage V1 cause any difference for this ticket?

Regards,
Arun Balajiee

________________________________
From: Wes McKinney <[email protected]>
Sent: Friday, February 21, 2020 3:06:58 AM
To: Parquet Dev <[email protected]>
Cc: Deepak Majeti <[email protected]>; Anatoli Shein 
<[email protected]>
Subject: Re: Arrow 1404: Adding index for Page-level Skipping

The data page statistics aren't currently being used during the "scan
to Arrow" procedure. That's likely to change at some point since the
Arrow Datasets project will provide a higher level API to indicate
filter predicates

On Thu, Feb 20, 2020 at 3:25 PM Lekshmi Narayanan, Arun Balajiee
<[email protected]> wrote:
>
> Thanks Wes. I got it now. I am working on that. But I have a general question 
> though, were page indices  which store min/max values implemented in arrow 
> parquet ( not referring to column indices or offset indices, just page 
> indices)
>
> Regards,
> Arun Balajiee
>
> From: Wes McKinney<mailto:[email protected]>
> Sent: Tuesday, February 18, 2020 3:34 AM
> To: Parquet Dev<mailto:[email protected]>
> Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> Shein<mailto:[email protected]>
> Subject: Re: Arrow 1404: Adding index for Page-level Skipping
>
> That's helpful, but I think it would be a good idea to have enough
> information in the header files to determine what the new APIs do
> without reading example code.
>
> On Mon, Feb 17, 2020 at 10:59 AM Lekshmi Narayanan, Arun Balajiee
> <[email protected]> wrote:
> >
> > I also made changes in the low-level-api folder, couldn’t capture in that 
> > link I think
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661797398&amp;sdata=slrrTS3YTiloexbzqsZ6GTy72Ok%2FimFBb%2F8%2Fl2fNDlM%3D&amp;reserved=0
> >
> > Regards,
> > Arun Balajiee
> >
> > ________________________________
> > From: Wes McKinney <[email protected]>
> > Sent: Monday, February 17, 2020 8:11:09 AM
> > To: Parquet Dev <[email protected]>
> > Cc: Deepak Majeti <[email protected]>; Anatoli Shein 
> > <[email protected]>
> > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> >
> > hi Arun,
> >
> > By "public APIs" I was referring to changes in the public header
> > files. I see there are some changes to parquet/file_reader.h and
> > metadata.h
> >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661802376&amp;sdata=DmAizgy3EKwENlRFBfxgvNAXE2Pq%2FctKlZaymn5dUxY%3D&amp;reserved=0
> >
> > Can you add some Doxygen comments to the new APIs that explain how
> > these APIs are to be used (and what the parameters mean)? The hope
> > would be that a user could make use of the column index functionality
> > by reading the .h files only.
> >
> > Thanks
> > Wes
> >
> > On Fri, Feb 14, 2020 at 2:57 PM Lekshmi Narayanan, Arun Balajiee
> > <[email protected]> wrote:
> > >
> > > Hi
> > > I have made my changes for api here, does it look good and is this what 
> > > you were seeking from me? The writer- api is still in the works and I 
> > > need to make the reader more generic to support all class data types.
> > >
> > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661802376&amp;sdata=hqOXB0h%2FI%2FhLgD6FDFFjw2RH4xAKxqWTPjM7rMJ8llw%3D&amp;reserved=0
> > >
> > >
> > > Regards,
> > > Arun Balajiee
> > >
> > > From: Wes McKinney<mailto:[email protected]>
> > > Sent: Tuesday, February 4, 2020 11:24 PM
> > > To: Parquet Dev<mailto:[email protected]>
> > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> > > Shein<mailto:[email protected]>
> > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > >
> > > hi Arun,
> > >
> > > We can keep the discussion going on here and on GitHub when you have a
> > > pull request to discuss. There are a number of different people who
> > > can give advice.
> > >
> > > Thanks
> > >
> > > On Tue, Feb 4, 2020 at 10:11 PM Lekshmi Narayanan, Arun Balajiee
> > > <[email protected]> wrote:
> > > >
> > > > Actually I made some changes after the date on the pull request ( even 
> > > > in this year), which are not getting reflected on this compare link
> > > >
> > > > Regards,
> > > > Arun Balajiee
> > > >
> > > > From: Wes McKinney<mailto:[email protected]>
> > > > Sent: Tuesday, February 4, 2020 6:43 PM
> > > > To: Parquet Dev<mailto:[email protected]>
> > > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> > > > Shein<mailto:[email protected]>
> > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > > >
> > > > Here's a compare link in case others want to have a look
> > > >
> > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661807354&amp;sdata=uWSkYO0twW6OJbzM9ZPfWaqUkHa7Mcw0AMzXRHdNPUQ%3D&amp;reserved=0
> > > >
> > > > On Tue, Feb 4, 2020 at 5:41 PM Wes McKinney <[email protected]> wrote:
> > > > >
> > > > > hi Arun,
> > > > >
> > > > > I took a brief look at your branch. One thing that is missing is the
> > > > > proposed public APIs that use the index pages -- that would be very
> > > > > helpful for this discussion.
> > > > >
> > > > > I don't think we have any code for doing random access of a particular
> > > > > data page in a column chunk, so having as an initial matter would also
> > > > > be helpful.
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Tue, Feb 4, 2020 at 2:28 PM Lekshmi Narayanan, Arun Balajiee
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Parquet dev
> > > > > >
> > > > > > Deepak Majeti was my dev lead during my summer internship, from 
> > > > > > when I am trying to add a few changes in the Arrow Parquet Project 
> > > > > > for the ticket below
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1404&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661807354&amp;sdata=ePKyArtjSe9qOZ1HyIANj90nij32sDrAs1D8o0e2xE4%3D&amp;reserved=0
> > > > > >  (Assigned to Deepak)
> > > > > >
> > > > > > With this regard, I am making a few changes to 
> > > > > > src/parquet/file_reader.cc ( in a fork on my repository)
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Ftree%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C6925e0cbd68348c5df6f08d7b6a520a3%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637178692661807354&amp;sdata=hkqUceGkt2BmIV4HtJuAac49DsL0Amv8D29sMI%2BbNh8%3D&amp;reserved=0
> > > > > >
> > > > > > I am stuck at trying to read a particular row using the index that 
> > > > > > I get in the page_location array struct of offset index. Could you 
> > > > > > help me with this ? and if there have been discussions on the 
> > > > > > forums for this as well, could you direct me to that link?
> > > > > >
> > > > > > Regards,
> > > > > > Arun Balajiee
> > > > > >
> > > >
> > >
>

Reply via email to