[ 
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517079#comment-17517079
 ] 

ASF GitHub Bot commented on PARQUET-2006:
-----------------------------------------

rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842090063


##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java:
##########
@@ -170,6 +174,24 @@ public Void visit(Not not) {
 
   private <T extends Comparable<T>> void validateColumn(Column<T> column) {
     ColumnPath path = column.getColumnPath();
+    if (path == null) {
+      HashSet<Type.ID> ids = new HashSet<>();
+      Type.ID id = column.getColumnId();
+      List<ColumnDescriptor> columnDescriptors = new 
ArrayList<>(columnsAccordingToSchema.values());
+      for (ColumnDescriptor columnDescriptor : columnDescriptors) {
+        Type.ID columnId = columnDescriptor.getId();
+        if (columnId != null) {
+          if (ids.contains(columnId)) {
+            throw new RuntimeException("duplicate id");

Review Comment:
   I doubt that this is the right place to catch duplicate column IDs. Also, I 
think it should probably throw an exception more specific than 
`RuntimeException`.





> Column resolution by ID
> -----------------------
>
>                 Key: PARQUET-2006
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2006
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this 
> would be a problem. Iceberg uses ID and stored Id/name mappings. 
> This Jira is to add column ID resolution support. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to