When I look at this I have one concern. How does this deal with diffrent
logical types? For example if I have a date column I am going to have to
convert whatever is used as a physical type to represent date and convert
that to a date type. Is this correct? Is there a place where I can see
documentation on how these different values are represented in their
different physical types so that I can ensure that I interpret the value of
min and max properly from statistics.

If I have a UINT 64 but that physical type is represented in an INT64 or
INT96 (I am not sure which of these is used in this case) do I need to
somehow modify the value of the physical type which is what will get
returned by the TypedRowGroupStatistic?

Thank you for your response. I was working late and must have had a
brainfart since obviously encodingMin clearly states what it does :).

Felipe

On Tue, Jul 25, 2017 at 6:25 AM, Artem Tarasov <[email protected]> wrote:

> Hi Felipe,
>
> Encode* functions do the opposite of what you want, encoding values of
> any type to binary format to be stored on disk.
> You need to call
> static_pointer_cast<ByteArrayStatistics/Int32Statistics/etc.> and then
> you'll have min()/max() methods to get the values.
>
>
> Best,
> Artem
>
>
> On 25.07.2017 08:20, Felipe Aramburu wrote:
> > I have included some code below that shows the context of where this is
> > being retrieved but basically I am trying to do the following:
> >
> > std::shared_ptr<parquet::RowGroupStatistics> statistics =
> > columnMetaData->statistics();
> > if(statistics->HasMinMax()){
> > minStrings[blazingColumnIndex][rowGroupIndex] = statistics->EncodeMin();
> > maxStrings[blazingColumnIndex][rowGroupIndex] = statistics->EncodeMax();
> >
> >
> > I look at the values in statistics->EncodeMin() and I am not exactly sure
> > how to interpret them. What is the proper way for getting this value into
> > an Int or Long or whatever C type represents the underlying data? What is
> > the most concise way of retrieving the min and max values of every column
> > in every row group inside of a parquet file?
> >
> > Any help is greatly appreciated.
> >
> > Felipe Aramburu
> >
> >
> >
> > for(int rowGroupIndex  = 0; rowGroupIndex < num_row_groups;
> > rowGroupIndex++){
> > std::shared_ptr<parquet::RowGroupReader> groupReader =
> > parquet_reader->RowGroup(rowGroupIndex);
> > const parquet::RowGroupMetaData* rowGroupMetadata =
> groupReader->metadata();
> > for(int blazingColumnIndex = 0; blazingColumnIndex <
> > blazingColumnToParquetColumn.size(); blazingColumnIndex++){
> > std::unique_ptr<parquet::ColumnChunkMetaData> columnMetaData =
> > rowGroupMetadata->ColumnChunk(blazingColumnToParquetColumn[
> blazingColumnIndex]);
> > const parquet::ColumnDescriptor * column =
> > schema->Column(blazingColumnToParquetColumn[blazingColumnIndex]);
> >
> > if(columnMetaData->is_stats_set()){
> > std::shared_ptr<parquet::RowGroupStatistics> statistics =
> > columnMetaData->statistics();
> > if(statistics->HasMinMax()){
> > minStrings[blazingColumnIndex][rowGroupIndex] = statistics->EncodeMin();
> > maxStrings[blazingColumnIndex][rowGroupIndex] = statistics->EncodeMax();
> > }else{
> > //set min and max max values
> > minStrings[blazingColumnIndex][rowGroupIndex] = "min";
> > maxStrings[blazingColumnIndex][rowGroupIndex] = "max";
> > }
> > }else{
> > //set minData to value min and maxData to value max if not statistics
> exists
> > minStrings[blazingColumnIndex][rowGroupIndex] = "min";
> > maxStrings[blazingColumnIndex][rowGroupIndex] = "max";
> > }
> > }
> > }
> >
>
>

Reply via email to