[
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517079#comment-17517079
]
ASF GitHub Bot commented on PARQUET-2006:
-----------------------------------------
rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842090063
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java:
##########
@@ -170,6 +174,24 @@ public Void visit(Not not) {
private <T extends Comparable<T>> void validateColumn(Column<T> column) {
ColumnPath path = column.getColumnPath();
+ if (path == null) {
+ HashSet<Type.ID> ids = new HashSet<>();
+ Type.ID id = column.getColumnId();
+ List<ColumnDescriptor> columnDescriptors = new
ArrayList<>(columnsAccordingToSchema.values());
+ for (ColumnDescriptor columnDescriptor : columnDescriptors) {
+ Type.ID columnId = columnDescriptor.getId();
+ if (columnId != null) {
+ if (ids.contains(columnId)) {
+ throw new RuntimeException("duplicate id");
Review Comment:
I doubt that this is the right place to catch duplicate column IDs. Also, I
think it should probably throw an exception more specific than
`RuntimeException`.
> Column resolution by ID
> -----------------------
>
> Key: PARQUET-2006
> URL: https://issues.apache.org/jira/browse/PARQUET-2006
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Xinli Shang
> Assignee: Xinli Shang
> Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this
> would be a problem. Iceberg uses ID and stored Id/name mappings.
> This Jira is to add column ID resolution support.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)