What are summary files?

Zoltan Ivanfi Thu, 27 Jul 2017 07:04:23 -0700

Hi,

I came across some references to so-called "summary files" in
ParquetFileReader.java
<https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java>.
I wanted to find out what they are, but could hardly find any information
on the Internet. From the source code it seems that they replicate a
Parquet file's footer in a separate file, but I couldn't find them
mentioned in any documentation. I found this JIRA
<https://issues.apache.org/jira/browse/SPARK-15719> about disabling them in
Spark because they were not considered useful.


Are summary files obsolete or are they still in use? What is their intended
use? Are they documented somewhere?

Thanks,

Zoltan

What are summary files?

Reply via email to