[ 
https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414829#comment-17414829
 ] 

Gabor Szadovszky commented on PARQUET-2088:
-------------------------------------------

Ah, I see. So, that code part is not about a feature but a bug fix. It is the 
pain in file format implementations that you not only have to fix issues in the 
code but you have to deal with invalid files written by that faulty code (if it 
was released). This time we've had to implement a workaround for those invalid 
files written by parquet-mr releases before 1.8.0.
I am not sure how the Impala reader/writer works. I work on parquet-mr and 
Impala is not tightly part of the Parquet community. It is more an example that 
the created_by field has to be filled by the application actually implements 
the writing of the parquet files. So e.g. Hive, Spark etc. won't be listed here 
ever as they are using parquet-mr to write/read the files. Impala has its own 
writer/reader implementation.

> Different created_by field values for application and library
> -------------------------------------------------------------
>
>                 Key: PARQUET-2088
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2088
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: format-2.9.0
>            Reporter: Joshua Howard
>            Priority: Minor
>
> There seems to be a discrepancy in the Parquet format created_by field 
> regarding how it should be filled out. The parquet-mr library uses this value 
> to enable/disable features based on the parquet-mr version 
> [here|https://github.com/apache/parquet-mr/blob/5f403501e9de05b6aa48f028191b4e78bb97fb12/parquet-column/src/main/java/org/apache/parquet/CorruptDeltaByteArrays.java#L64-L68].
>  Meanwhile, users are encouraged to make use of the application version 
> [here|https://www.javadoc.io/doc/org.apache.parquet/parquet-format/latest/org/apache/parquet/format/FileMetaData.html].
>  It seems like there are multiple fields needed for an application and 
> library version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to