Is this internally in the class or adding a parameter in the API?  What is
the use case?

On Saturday, June 13, 2020, Lekshmi Narayanan, Arun Balajiee <
[email protected]> wrote:

> Hi Dev
>
> Thanks Wes for these comments.
>
> As Informed in other threads, I have completed most of it. Will try to
> structure it according to the comments.
>
> I had one question reading a (un)related matter. whenever we make calls to
>
> ReadBatch(int64_t batch_size, int16_t* def_levels,
>                                                 int16_t* rep_levels, T*
> values,
>                                                 int64_t* values_read)
>
> Is there are possibility to keep track of which page we are at to retrieve
> values?
>
> Regards
> Arun Balajiee
> ________________________________
> From: Wes McKinney <[email protected]>
> Sent: 02 April 2020 13:16
> To: Parquet Dev <[email protected]>
> Cc: Deepak Majeti <[email protected]>; Anatoli Shein <
> [email protected]>
> Subject: Re: Arrow 1404: Adding index for Page-level Skipping
>
> I just left comments on the PR. The new APIs (their semantics and what
> should be passed as arguments) are still not adequately documented (in
> other words, I wouldn't know how to use them just from reading the
> header file), so I think we should focus on that for the moment. In
> fairness documentation for other functions in these headers in poor,
> but they also have the semantics of "read all data in the file from
> start to finish". These new APIs appear to do something different, so
> we need to write that down in detail in Doxygen-style comments
>
> On Thu, Apr 2, 2020 at 2:23 AM Lekshmi Narayanan, Arun Balajiee
> <[email protected]> wrote:
> >
> > Hi
> > Would my pull request be useful for the discussion from here?
> > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F6807&amp;data=02%7C01%
> 7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%
> 7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509861845&amp;
> sdata=PQAIxpTPm87qRb%2FmZoHXfLCsdcCiyC%2Biqui40tqEd9U%3D&amp;reserved=0
> >
> > Regards,
> > Arun Balajiee
> >
> > From: Wes McKinney<mailto:[email protected]>
> > Sent: Tuesday, February 18, 2020 3:34 AM
> > To: Parquet Dev<mailto:[email protected]>
> > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli
> Shein<mailto:[email protected]>
> > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> >
> > That's helpful, but I think it would be a good idea to have enough
> > information in the header files to determine what the new APIs do
> > without reading example code.
> >
> > On Mon, Feb 17, 2020 at 10:59 AM Lekshmi Narayanan, Arun Balajiee
> > <[email protected]> wrote:
> > >
> > > I also made changes in the low-level-api folder, couldn’t capture in
> that link I think
> > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-
> pages-to-the-format-to-support-efficient-page-
> skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-
> level-api%2Freader-writer-with-index.cc&amp;data=02%
> 7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%
> 7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509861845&amp;
> sdata=vxuK%2BvZRtwhLcGepda6T5i3r6HDk0JLS3vh9leIcBlo%3D&amp;reserved=0
> > >
> > > Regards,
> > > Arun Balajiee
> > >
> > > ________________________________
> > > From: Wes McKinney <[email protected]>
> > > Sent: Monday, February 17, 2020 8:11:09 AM
> > > To: Parquet Dev <[email protected]>
> > > Cc: Deepak Majeti <[email protected]>; Anatoli Shein <
> [email protected]>
> > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > >
> > > hi Arun,
> > >
> > > By "public APIs" I was referring to changes in the public header
> > > files. I see there are some changes to parquet/file_reader.h and
> > > metadata.h
> > >
> > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%
> 2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-
> format-to-support-efficient-page-skipping-to-parquet-cpp&
> amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%
> 7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=
> rBl3pY6bRFuSzWg2QT2Ca6aui2HZJjSoh1mbzDq%2F93M%3D&amp;reserved=0
> > >
> > > Can you add some Doxygen comments to the new APIs that explain how
> > > these APIs are to be used (and what the parameters mean)? The hope
> > > would be that a user could make use of the column index functionality
> > > by reading the .h files only.
> > >
> > > Thanks
> > > Wes
> > >
> > > On Fri, Feb 14, 2020 at 2:57 PM Lekshmi Narayanan, Arun Balajiee
> > > <[email protected]> wrote:
> > > >
> > > > Hi
> > > > I have made my changes for api here, does it look good and is this
> what you were seeking from me? The writer- api is still in the works and I
> need to make the reader more generic to support all class data types.
> > > >
> > > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-
> pages-to-the-format-to-support-efficient-page-
> skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-
> level-api%2Freader-writer-with-index.cc&amp;data=02%
> 7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%
> 7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;
> sdata=TB29CbqF3MlD0v9d%2BOTH%2FD4NAF%2BqGJvpMpJZIeWd2P4%3D&amp;reserved=0
> > > >
> > > >
> > > > Regards,
> > > > Arun Balajiee
> > > >
> > > > From: Wes McKinney<mailto:[email protected]>
> > > > Sent: Tuesday, February 4, 2020 11:24 PM
> > > > To: Parquet Dev<mailto:[email protected]>
> > > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli
> Shein<mailto:[email protected]>
> > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > > >
> > > > hi Arun,
> > > >
> > > > We can keep the discussion going on here and on GitHub when you have
> a
> > > > pull request to discuss. There are a number of different people who
> > > > can give advice.
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Feb 4, 2020 at 10:11 PM Lekshmi Narayanan, Arun Balajiee
> > > > <[email protected]> wrote:
> > > > >
> > > > > Actually I made some changes after the date on the pull request (
> even in this year), which are not getting reflected on this compare link
> > > > >
> > > > > Regards,
> > > > > Arun Balajiee
> > > > >
> > > > > From: Wes McKinney<mailto:[email protected]>
> > > > > Sent: Tuesday, February 4, 2020 6:43 PM
> > > > > To: Parquet Dev<mailto:[email protected]>
> > > > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli
> Shein<mailto:[email protected]>
> > > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > > > >
> > > > > Here's a compare link in case others want to have a look
> > > > >
> > > > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%
> 2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-
> format-to-support-efficient-page-skipping-to-parquet-cpp&
> amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%
> 7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=
> rBl3pY6bRFuSzWg2QT2Ca6aui2HZJjSoh1mbzDq%2F93M%3D&amp;reserved=0
> > > > >
> > > > > On Tue, Feb 4, 2020 at 5:41 PM Wes McKinney <[email protected]>
> wrote:
> > > > > >
> > > > > > hi Arun,
> > > > > >
> > > > > > I took a brief look at your branch. One thing that is missing is
> the
> > > > > > proposed public APIs that use the index pages -- that would be
> very
> > > > > > helpful for this discussion.
> > > > > >
> > > > > > I don't think we have any code for doing random access of a
> particular
> > > > > > data page in a column chunk, so having as an initial matter
> would also
> > > > > > be helpful.
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > > On Tue, Feb 4, 2020 at 2:28 PM Lekshmi Narayanan, Arun Balajiee
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Parquet dev
> > > > > > >
> > > > > > > Deepak Majeti was my dev lead during my summer internship,
> from when I am trying to add a few changes in the Arrow Parquet Project for
> the ticket below
> > > > > > >
> > > > > > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-
> 1404&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729
> b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;
> sdata=IXX%2FwrAwPOFIAHl1WH4n6nNkq9JZ2asOf99dzIUxBN8%3D&amp;reserved=0
> (Assigned to Deepak)
> > > > > > >
> > > > > > > With this regard, I am making a few changes to
> src/parquet/file_reader.cc ( in a fork on my repository)
> > > > > > >
> > > > > > > https://nam05.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Ftree%2FPARQUET-1404-Add-index-
> pages-to-the-format-to-support-efficient-page-
> skipping-to-parquet-cpp%2Fcpp&amp;data=02%7C01%7CARL122%40pitt.edu%
> 7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112
> fd0d%7C1%7C1%7C637214446509871841&amp;sdata=ps%2FRPqvGv%
> 2F04f49yF0vPXBQv2Eu6mS8gZEW83Qg9Cv0%3D&amp;reserved=0
> > > > > > >
> > > > > > > I am stuck at trying to read a particular row using the index
> that I get in the page_location array struct of offset index. Could you
> help me with this ? and if there have been discussions on the forums for
> this as well, could you direct me to that link?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Arun Balajiee
> > > > > > >
> > > > >
> > > >
> >
>

Reply via email to