[
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517081#comment-17517081
]
ASF GitHub Bot commented on PARQUET-2006:
-----------------------------------------
rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842091649
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java:
##########
@@ -170,6 +174,24 @@ public Void visit(Not not) {
private <T extends Comparable<T>> void validateColumn(Column<T> column) {
ColumnPath path = column.getColumnPath();
+ if (path == null) {
+ HashSet<Type.ID> ids = new HashSet<>();
+ Type.ID id = column.getColumnId();
+ List<ColumnDescriptor> columnDescriptors = new
ArrayList<>(columnsAccordingToSchema.values());
+ for (ColumnDescriptor columnDescriptor : columnDescriptors) {
+ Type.ID columnId = columnDescriptor.getId();
+ if (columnId != null) {
+ if (ids.contains(columnId)) {
+ throw new RuntimeException("duplicate id");
+ }
+ ids.add(columnId);
+ if (columnId.intValue() == id.intValue()) {
+ column.setColumnPath(ColumnPath.get(columnDescriptor.getPath()));
Review Comment:
I appreciate not wanting to change the validation logic, but making the
filter `Column` mutable just to be able to add the path seems like a bad idea
in general. What happens if the filter is reused and the path changes? It looks
like this would use the first path for a column that was encountered.
I think this should validate directly using IDs instead.
> Column resolution by ID
> -----------------------
>
> Key: PARQUET-2006
> URL: https://issues.apache.org/jira/browse/PARQUET-2006
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Xinli Shang
> Assignee: Xinli Shang
> Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this
> would be a problem. Iceberg uses ID and stored Id/name mappings.
> This Jira is to add column ID resolution support.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)