[jira] [Commented] (PARQUET-2006) Column resolution by ID

ASF GitHub Bot (Jira) Mon, 04 Apr 2022 13:05:22 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517084#comment-17517084
 ]


ASF GitHub Bot commented on PARQUET-2006:
-----------------------------------------

rdblue commented on code in PR #950:
URL: https://github.com/apache/parquet-mr/pull/950#discussion_r842096174


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java:
##########
@@ -878,11 +880,97 @@ public String getFile() {
     return blocks;
   }
 
-  public void setRequestedSchema(MessageType projection) {
+  private boolean uniqueId(GroupType schema, HashSet<Type.ID> ids) {

Review Comment:
   I don't think it is a good practice to modify a set that's passed in. I 
would expect this to produce a set. If you want to throw an exception because 
this finds a duplicate ID, then I think it should just throw an exception in 
this method.





> Column resolution by ID
> -----------------------
>
>                 Key: PARQUET-2006
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2006
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>
> Parquet relies on the name. In a lot of usages e.g. schema resolution, this 
> would be a problem. Iceberg uses ID and stored Id/name mappings. 
> This Jira is to add column ID resolution support. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PARQUET-2006) Column resolution by ID

Reply via email to