[
https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414829#comment-17414829
]
Gabor Szadovszky commented on PARQUET-2088:
-------------------------------------------
Ah, I see. So, that code part is not about a feature but a bug fix. It is the
pain in file format implementations that you not only have to fix issues in the
code but you have to deal with invalid files written by that faulty code (if it
was released). This time we've had to implement a workaround for those invalid
files written by parquet-mr releases before 1.8.0.
I am not sure how the Impala reader/writer works. I work on parquet-mr and
Impala is not tightly part of the Parquet community. It is more an example that
the created_by field has to be filled by the application actually implements
the writing of the parquet files. So e.g. Hive, Spark etc. won't be listed here
ever as they are using parquet-mr to write/read the files. Impala has its own
writer/reader implementation.
> Different created_by field values for application and library
> -------------------------------------------------------------
>
> Key: PARQUET-2088
> URL: https://issues.apache.org/jira/browse/PARQUET-2088
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: format-2.9.0
> Reporter: Joshua Howard
> Priority: Minor
>
> There seems to be a discrepancy in the Parquet format created_by field
> regarding how it should be filled out. The parquet-mr library uses this value
> to enable/disable features based on the parquet-mr version
> [here|https://github.com/apache/parquet-mr/blob/5f403501e9de05b6aa48f028191b4e78bb97fb12/parquet-column/src/main/java/org/apache/parquet/CorruptDeltaByteArrays.java#L64-L68].
> Meanwhile, users are encouraged to make use of the application version
> [here|https://www.javadoc.io/doc/org.apache.parquet/parquet-format/latest/org/apache/parquet/format/FileMetaData.html].
> It seems like there are multiple fields needed for an application and
> library version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)