[ 
https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415378#comment-17415378
 ] 

Gabor Szadovszky commented on PARQUET-2088:
-------------------------------------------

parquet-mr automatically fills the {{created_by}} field by using FULL_VERSION. 
The components using it (Hive/Spark) do not have to populate anything. So if 
parquet-mr writes a file the proper full version string of parquet-mr will be 
written to the field every time.

You are right that there is no separate field to fill the version of the 
"higher level" application. (I remember some discussions about this topic but 
could not find it in the jiras :( ) The issue here is which application version 
should we store? For example there is a customer code that uses a tool written 
for Spark that writes the parquet file. We can make mistakes at any level that 
may cause invalid values (from a certain point of view). So how should we 
handle this and how can we formalize it? Also, how can we enforce the client 
codes to fill these fields?
Anyway, if you have a proposal feel free to write to the dev list.

> Different created_by field values for application and library
> -------------------------------------------------------------
>
>                 Key: PARQUET-2088
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2088
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: format-2.9.0
>            Reporter: Joshua Howard
>            Priority: Minor
>
> There seems to be a discrepancy in the Parquet format created_by field 
> regarding how it should be filled out. The parquet-mr library uses this value 
> to enable/disable features based on the parquet-mr version 
> [here|https://github.com/apache/parquet-mr/blob/5f403501e9de05b6aa48f028191b4e78bb97fb12/parquet-column/src/main/java/org/apache/parquet/CorruptDeltaByteArrays.java#L64-L68].
>  Meanwhile, users are encouraged to make use of the application version 
> [here|https://www.javadoc.io/doc/org.apache.parquet/parquet-format/latest/org/apache/parquet/format/FileMetaData.html].
>  It seems like there are multiple fields needed for an application and 
> library version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to