Xinli Shang created PARQUET-2093: ------------------------------------ Summary: Add rewriter version to Parquet footer Key: PARQUET-2093 URL: https://issues.apache.org/jira/browse/PARQUET-2093 Project: Parquet Issue Type: Improvement Affects Versions: 1.13.0 Reporter: Xinli Shang Assignee: Xinli Shang
Parquet footer records the writer's version in the field of 'create-by'. As we introduce several rewrites, the new file is written partially by the rewriter. In this case, we need to record the rewriter's version also. Some questions (about a common rewriter) we need to answer before step forward: What would be the place of the rewriter versions? (New specific field or key-value metadata? Which key shall we use?) Shall we somehow also save what the rewriter has done? How? At what level shall we copy the original created_by field and what level shall we write the version of the rewriter to that field instead? (What different levels are possible?) >From the introduction of this rewriter(s) field in case of any related writer >version dependent fix we need to check this field as well and not only the >created_by one. -- This message was sent by Atlassian Jira (v8.3.4#803005)