Re: [PR] [GH-3035] [WIP] ParquetRewriter: Add a column renaming feature [parquet-java]

via GitHub Wed, 23 Oct 2024 22:41:30 -0700


MaxNevermind commented on code in PR #3036:
URL: https://github.com/apache/parquet-java/pull/3036#discussion_r1814291131



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -145,15 +145,18 @@ public class ParquetRewriter implements Closeable {
   private final Queue<TransParquetFileReader> inputFiles = new LinkedList<>();
   private final Queue<TransParquetFileReader> inputFilesToJoin = new 
LinkedList<>();
   private final MessageType outSchema;
+  private final MessageType outSchemaWithRenamedColumns;

Review Comment:
   I tried that originally but it just doesn't work. There are places where we 
need to have old schema. For example when we read the data we need to provide 
an old path in original schema, if we use a new schema, renamed column just 
won't be able found and won't be able to read as it is just not there. Maybe we 
can have some sort of abstraction like a some schema holder that allows to 
extract both old and rename columns through a single entity but I don't it is 
possible to avoid the necessity of having both somewhere in the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GH-3035] [WIP] ParquetRewriter: Add a column renaming feature [parquet-java]

Reply via email to