rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842091649
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java:
##########
@@ -170,6 +174,24 @@ public Void visit(Not not) {
private <T extends Comparable<T>> void validateColumn(Column<T> column) {
ColumnPath path = column.getColumnPath();
+ if (path == null) {
+ HashSet<Type.ID> ids = new HashSet<>();
+ Type.ID id = column.getColumnId();
+ List<ColumnDescriptor> columnDescriptors = new
ArrayList<>(columnsAccordingToSchema.values());
+ for (ColumnDescriptor columnDescriptor : columnDescriptors) {
+ Type.ID columnId = columnDescriptor.getId();
+ if (columnId != null) {
+ if (ids.contains(columnId)) {
+ throw new RuntimeException("duplicate id");
+ }
+ ids.add(columnId);
+ if (columnId.intValue() == id.intValue()) {
+ column.setColumnPath(ColumnPath.get(columnDescriptor.getPath()));
Review Comment:
I appreciate not wanting to change the validation logic, but making the
filter `Column` mutable just to be able to add the path seems like a bad idea
in general. What happens if the filter is reused and the path changes? It looks
like this would use the first path for a column that was encountered.
I think this should validate directly using IDs instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]