I just left comments on the PR. The new APIs (their semantics and what
should be passed as arguments) are still not adequately documented (in
other words, I wouldn't know how to use them just from reading the
header file), so I think we should focus on that for the moment. In
fairness documentation for other functions in these headers in poor,
but they also have the semantics of "read all data in the file from
start to finish". These new APIs appear to do something different, so
we need to write that down in detail in Doxygen-style comments

On Thu, Apr 2, 2020 at 2:23 AM Lekshmi Narayanan, Arun Balajiee
<[email protected]> wrote:
>
> Hi
> Would my pull request be useful for the discussion from here?
> https://github.com/apache/arrow/pull/6807
>
> Regards,
> Arun Balajiee
>
> From: Wes McKinney<mailto:[email protected]>
> Sent: Tuesday, February 18, 2020 3:34 AM
> To: Parquet Dev<mailto:[email protected]>
> Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> Shein<mailto:[email protected]>
> Subject: Re: Arrow 1404: Adding index for Page-level Skipping
>
> That's helpful, but I think it would be a good idea to have enough
> information in the header files to determine what the new APIs do
> without reading example code.
>
> On Mon, Feb 17, 2020 at 10:59 AM Lekshmi Narayanan, Arun Balajiee
> <[email protected]> wrote:
> >
> > I also made changes in the low-level-api folder, couldn’t capture in that 
> > link I think
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&amp;sdata=T%2Fo7CdxHvvN11Eox9JR6mKAWx75s1aGJUqONVBjVK08%3D&amp;reserved=0
> >
> > Regards,
> > Arun Balajiee
> >
> > ________________________________
> > From: Wes McKinney <[email protected]>
> > Sent: Monday, February 17, 2020 8:11:09 AM
> > To: Parquet Dev <[email protected]>
> > Cc: Deepak Majeti <[email protected]>; Anatoli Shein 
> > <[email protected]>
> > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> >
> > hi Arun,
> >
> > By "public APIs" I was referring to changes in the public header
> > files. I see there are some changes to parquet/file_reader.h and
> > metadata.h
> >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&amp;sdata=lBWBkrHBuqWjCzQ5t5JLUAw6NfIHbVFGC990L%2BDjGoA%3D&amp;reserved=0
> >
> > Can you add some Doxygen comments to the new APIs that explain how
> > these APIs are to be used (and what the parameters mean)? The hope
> > would be that a user could make use of the column index functionality
> > by reading the .h files only.
> >
> > Thanks
> > Wes
> >
> > On Fri, Feb 14, 2020 at 2:57 PM Lekshmi Narayanan, Arun Balajiee
> > <[email protected]> wrote:
> > >
> > > Hi
> > > I have made my changes for api here, does it look good and is this what 
> > > you were seeking from me? The writer- api is still in the works and I 
> > > need to make the reader more generic to support all class data types.
> > >
> > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&amp;sdata=T%2Fo7CdxHvvN11Eox9JR6mKAWx75s1aGJUqONVBjVK08%3D&amp;reserved=0
> > >
> > >
> > > Regards,
> > > Arun Balajiee
> > >
> > > From: Wes McKinney<mailto:[email protected]>
> > > Sent: Tuesday, February 4, 2020 11:24 PM
> > > To: Parquet Dev<mailto:[email protected]>
> > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> > > Shein<mailto:[email protected]>
> > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > >
> > > hi Arun,
> > >
> > > We can keep the discussion going on here and on GitHub when you have a
> > > pull request to discuss. There are a number of different people who
> > > can give advice.
> > >
> > > Thanks
> > >
> > > On Tue, Feb 4, 2020 at 10:11 PM Lekshmi Narayanan, Arun Balajiee
> > > <[email protected]> wrote:
> > > >
> > > > Actually I made some changes after the date on the pull request ( even 
> > > > in this year), which are not getting reflected on this compare link
> > > >
> > > > Regards,
> > > > Arun Balajiee
> > > >
> > > > From: Wes McKinney<mailto:[email protected]>
> > > > Sent: Tuesday, February 4, 2020 6:43 PM
> > > > To: Parquet Dev<mailto:[email protected]>
> > > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli 
> > > > Shein<mailto:[email protected]>
> > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > > >
> > > > Here's a compare link in case others want to have a look
> > > >
> > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&amp;sdata=lBWBkrHBuqWjCzQ5t5JLUAw6NfIHbVFGC990L%2BDjGoA%3D&amp;reserved=0
> > > >
> > > > On Tue, Feb 4, 2020 at 5:41 PM Wes McKinney <[email protected]> wrote:
> > > > >
> > > > > hi Arun,
> > > > >
> > > > > I took a brief look at your branch. One thing that is missing is the
> > > > > proposed public APIs that use the index pages -- that would be very
> > > > > helpful for this discussion.
> > > > >
> > > > > I don't think we have any code for doing random access of a particular
> > > > > data page in a column chunk, so having as an initial matter would also
> > > > > be helpful.
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Tue, Feb 4, 2020 at 2:28 PM Lekshmi Narayanan, Arun Balajiee
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Parquet dev
> > > > > >
> > > > > > Deepak Majeti was my dev lead during my summer internship, from 
> > > > > > when I am trying to add a few changes in the Arrow Parquet Project 
> > > > > > for the ticket below
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1404&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&amp;sdata=4BxOKIB%2BfmVjq3o5uEdkZKWmr1fmciZ3zJLqRvkKvhs%3D&amp;reserved=0
> > > > > >  (Assigned to Deepak)
> > > > > >
> > > > > > With this regard, I am making a few changes to 
> > > > > > src/parquet/file_reader.cc ( in a fork on my repository)
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Ftree%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp&amp;data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627319513&amp;sdata=p%2B0hijhEwfUN7VK%2Fqz0GY0LGx%2BTHqGozDYthpue5JnE%3D&amp;reserved=0
> > > > > >
> > > > > > I am stuck at trying to read a particular row using the index that 
> > > > > > I get in the page_location array struct of offset index. Could you 
> > > > > > help me with this ? and if there have been discussions on the 
> > > > > > forums for this as well, could you direct me to that link?
> > > > > >
> > > > > > Regards,
> > > > > > Arun Balajiee
> > > > > >
> > > >
> > >
>

Reply via email to