The answer to this is probably no. But I  imagine that it is not considered
acceptable to try and modify this statistics information AFTER the parquet
file has been generated correct?
ᐧ

On Thu, Jun 8, 2017 at 9:59 AM, Lars Volker <[email protected]> wrote:

> I suppose you would look at the Statistics struct in the parquet.thrift
> <https://github.com/apache/parquet-format/blob/master/
> src/main/thrift/parquet.thrift>
> file
> in the parquet-format project. Before spending much time on this, you may
> want to seek more feedback, possibly on this list, and by opening a JIRA.
> Since it likely is a rather small change, you might also go ahead and
> create a pull request and ask for feedback there. Please note, that the PR
> will need a corresponding JIRA in its title.
>
> You can find more detailed information on the individual steps here:
> https://parquet.apache.org/contribute/
>
> Cheers, Lars
>
> On Wed, Jun 7, 2017 at 11:40 AM, Felipe Aramburu <[email protected]>
> wrote:
>
> > So I guess its just calculating the distance between the offsets. For now
> > we might just make that part of our "catalogue" step. If I wanted to add
> it
> > to statistics is there somewhere you can point me to where that would be
> > added?
> >
> > Felipe
> > ᐧ
> >
> > On Wed, Jun 7, 2017 at 1:33 PM, Michael Howard <[email protected]>
> > wrote:
> >
> > > > Could this be a candidate to add to the Statistics?
> > >
> > > Agreed ... this would be good info to have.
> > >
> > > On Wed, Jun 7, 2017 at 2:25 PM, Lars Volker <[email protected]> wrote:
> > >
> > > > Could this be a candidate to add to the Statistics?
> > > >
> > > > On Wed, Jun 7, 2017 at 11:18 AM, Deepak Majeti <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > The parquet metadata does not have such information.
> > > > >
> > > > >
> > > > > On Wed, Jun 7, 2017 at 1:08 PM, Felipe Aramburu <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Is there any metadata available on the maximum length of an
> element
> > > > > > BYTE_ARRAY in a row group.
> > > > > >
> > > > > > So for example if I have a column which is of type BYTE_ARRAY
> > Logical
> > > > > type
> > > > > > UTF8 and I want to know what the longest possible element in the
> > row
> > > > > group
> > > > > > is.
> > > > > >
> > > > > > I am looking for a method to do this which does NOT require
> having
> > to
> > > > go
> > > > > > through the data itself. So I am asking if this metadata is
> stored
> > > > > > anywhere.
> > > > > >
> > > > > > Felipe
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> >
>

Reply via email to