Zoltan Ivanfi created PARQUET-899:
-------------------------------------
Summary: Add metadata field describing the application that wrote
the file
Key: PARQUET-899
URL: https://issues.apache.org/jira/browse/PARQUET-899
Project: Parquet
Issue Type: Improvement
Reporter: Zoltan Ivanfi
Although the Parquet library should behave the same regardless of what
application uses it, occasionally serious interoperability bugs are introduced
in specific applications. For example, data written by a specific application
may be unnecessarily adjusted or the calculated statistics may be invalid (both
actual problems).
Unfortunately, currently it is not possible to recognize Parquet files affected
by application problems because the metadata does not contain any information
about the application using the Parquet library. (The name and version number
of the Parquet library is recorded, but that only has limited use, because
apart from Impala, the most widespread Parquet writers all use the same Java
library.)
To allow creating workarounds for future known issues, we should introduce new
metadata fields that applications can populate. The simplest approach is to
have one field for the application name and another for its version number. A
more sophisticated approach suggested by [~julienledem] could also reference a
list of earlier issues that are known to be fixed in the application that wrote
the Parquet file.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)