Hi Would my pull request be useful for the discussion from here? https://github.com/apache/arrow/pull/6807
Regards, Arun Balajiee From: Wes McKinney<mailto:[email protected]> Sent: Tuesday, February 18, 2020 3:34 AM To: Parquet Dev<mailto:[email protected]> Cc: Deepak Majeti<mailto:[email protected]>; Anatoli Shein<mailto:[email protected]> Subject: Re: Arrow 1404: Adding index for Page-level Skipping That's helpful, but I think it would be a good idea to have enough information in the header files to determine what the new APIs do without reading example code. On Mon, Feb 17, 2020 at 10:59 AM Lekshmi Narayanan, Arun Balajiee <[email protected]> wrote: > > I also made changes in the low-level-api folder, couldn’t capture in that > link I think > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&sdata=T%2Fo7CdxHvvN11Eox9JR6mKAWx75s1aGJUqONVBjVK08%3D&reserved=0 > > Regards, > Arun Balajiee > > ________________________________ > From: Wes McKinney <[email protected]> > Sent: Monday, February 17, 2020 8:11:09 AM > To: Parquet Dev <[email protected]> > Cc: Deepak Majeti <[email protected]>; Anatoli Shein > <[email protected]> > Subject: Re: Arrow 1404: Adding index for Page-level Skipping > > hi Arun, > > By "public APIs" I was referring to changes in the public header > files. I see there are some changes to parquet/file_reader.h and > metadata.h > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&sdata=lBWBkrHBuqWjCzQ5t5JLUAw6NfIHbVFGC990L%2BDjGoA%3D&reserved=0 > > Can you add some Doxygen comments to the new APIs that explain how > these APIs are to be used (and what the parameters mean)? The hope > would be that a user could make use of the column index functionality > by reading the .h files only. > > Thanks > Wes > > On Fri, Feb 14, 2020 at 2:57 PM Lekshmi Narayanan, Arun Balajiee > <[email protected]> wrote: > > > > Hi > > I have made my changes for api here, does it look good and is this what you > > were seeking from me? The writer- api is still in the works and I need to > > make the reader more generic to support all class data types. > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&sdata=T%2Fo7CdxHvvN11Eox9JR6mKAWx75s1aGJUqONVBjVK08%3D&reserved=0 > > > > > > Regards, > > Arun Balajiee > > > > From: Wes McKinney<mailto:[email protected]> > > Sent: Tuesday, February 4, 2020 11:24 PM > > To: Parquet Dev<mailto:[email protected]> > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli > > Shein<mailto:[email protected]> > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping > > > > hi Arun, > > > > We can keep the discussion going on here and on GitHub when you have a > > pull request to discuss. There are a number of different people who > > can give advice. > > > > Thanks > > > > On Tue, Feb 4, 2020 at 10:11 PM Lekshmi Narayanan, Arun Balajiee > > <[email protected]> wrote: > > > > > > Actually I made some changes after the date on the pull request ( even in > > > this year), which are not getting reflected on this compare link > > > > > > Regards, > > > Arun Balajiee > > > > > > From: Wes McKinney<mailto:[email protected]> > > > Sent: Tuesday, February 4, 2020 6:43 PM > > > To: Parquet Dev<mailto:[email protected]> > > > Cc: Deepak Majeti<mailto:[email protected]>; Anatoli > > > Shein<mailto:[email protected]> > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping > > > > > > Here's a compare link in case others want to have a look > > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&sdata=lBWBkrHBuqWjCzQ5t5JLUAw6NfIHbVFGC990L%2BDjGoA%3D&reserved=0 > > > > > > On Tue, Feb 4, 2020 at 5:41 PM Wes McKinney <[email protected]> wrote: > > > > > > > > hi Arun, > > > > > > > > I took a brief look at your branch. One thing that is missing is the > > > > proposed public APIs that use the index pages -- that would be very > > > > helpful for this discussion. > > > > > > > > I don't think we have any code for doing random access of a particular > > > > data page in a column chunk, so having as an initial matter would also > > > > be helpful. > > > > > > > > - Wes > > > > > > > > On Tue, Feb 4, 2020 at 2:28 PM Lekshmi Narayanan, Arun Balajiee > > > > <[email protected]> wrote: > > > > > > > > > > Hi Parquet dev > > > > > > > > > > Deepak Majeti was my dev lead during my summer internship, from when > > > > > I am trying to add a few changes in the Arrow Parquet Project for the > > > > > ticket below > > > > > > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1404&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627309524&sdata=4BxOKIB%2BfmVjq3o5uEdkZKWmr1fmciZ3zJLqRvkKvhs%3D&reserved=0 > > > > > (Assigned to Deepak) > > > > > > > > > > With this regard, I am making a few changes to > > > > > src/parquet/file_reader.cc ( in a fork on my repository) > > > > > > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Ftree%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp&data=02%7C01%7CARL122%40pitt.edu%7C9ce829844ee2476da66b08d7b44d598f%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637176116627319513&sdata=p%2B0hijhEwfUN7VK%2Fqz0GY0LGx%2BTHqGozDYthpue5JnE%3D&reserved=0 > > > > > > > > > > I am stuck at trying to read a particular row using the index that I > > > > > get in the page_location array struct of offset index. Could you help > > > > > me with this ? and if there have been discussions on the forums for > > > > > this as well, could you direct me to that link? > > > > > > > > > > Regards, > > > > > Arun Balajiee > > > > > > > > > >
