[
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517084#comment-17517084
]
ASF GitHub Bot commented on PARQUET-2006:
-----------------------------------------
rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842096174
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -878,11 +880,97 @@ public String getFile() {
return blocks;
}
- public void setRequestedSchema(MessageType projection) {
+ private boolean uniqueId(GroupType schema, HashSet<Type.ID> ids) {
Review Comment:
I don't think it is a good practice to modify a set that's passed in. I
would expect this to produce a set. If you want to throw an exception because
this finds a duplicate ID, then I think it should just throw an exception in
this method.
> Column resolution by ID
> -----------------------
>
> Key: PARQUET-2006
> URL: https://issues.apache.org/jira/browse/PARQUET-2006
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Xinli Shang
> Assignee: Xinli Shang
> Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this
> would be a problem. Iceberg uses ID and stored Id/name mappings.
> This Jira is to add column ID resolution support.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)