ConeyLiu commented on code in PR #1273:
URL: https://github.com/apache/parquet-mr/pull/1273#discussion_r1558939543
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -137,16 +175,34 @@ public ParquetRewriter(RewriteOptions options) throws
IOException {
getPaths(schema, paths, null);
for (String col : pruneColumns) {
if (!paths.contains(col)) {
- LOG.warn("Input column name {} doesn't show up in the schema of file
{}", col, reader.getFile());
+ LOG.warn("Input column name {} doesn't show up in the schema", col);
}
}
Set<ColumnPath> prunePaths = convertToColumnPaths(pruneColumns);
schema = pruneColumnsInSchema(schema, prunePaths);
}
- this.descriptorsMap =
- schema.getColumns().stream().collect(Collectors.toMap(x ->
ColumnPath.get(x.getPath()), x -> x));
+ if (inputFilesR.isEmpty()) {
+ this.descriptorsMap =
+ schema.getColumns().stream().collect(Collectors.toMap(x ->
ColumnPath.get(x.getPath()), x -> x));
+ } else { // TODO: describe in documentation that only top level column can
be overwritten
+ this.descriptorsMap = schemaL.getColumns().stream()
+ .filter(x -> x.getPath().length == 0 ||
!fieldNamesR.containsKey(x.getPath()[0]))
Review Comment:
So, the left part columns' order could be changed if overwrite occurred.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]