[GitHub] [iceberg] jackye1995 opened a new pull request #2465: Core: add row identifier to schema

GitBox Mon, 12 Apr 2021 16:08:44 -0700


jackye1995 opened a new pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465



   
   Continuation of #2354
   
   @yyanyy @rdblue @openinx @aokolnychyi 
   
   Spec with row identifier:
   
   ```
   {
         "type": "struct",
         "schema-id": 1,
         "row-identifiers": [
           1,
           2
         ],
         "fields": [
           {
             "id": 1,
             "name": "x",
             "required": true,
             "type": "long"
           },
           {
             "id": 2,
             "name": "y",
             "required": true,
             "type": "long",
             "doc": "comment"
           },
           {
             "id": 3,
             "name": "z",
             "required": true,
             "type": "long"
           }
         ]
       }
   
   ```
   
   New Schema toString:
   
   ```
   table {
     fields {
       1: x: required long
       2: y: required long (comment)
       3: z: required long
     }
     row identifiers { 1,2 }
   }
   ```
   
   
   Update row identifier rules:
   1. row identifier should be added through 
`UpdateSchema.addRowIdentifier(columnName)`
   2. the column added should exist in schema or a part of the newly added 
columns (to make adding a new primary key a single atomic update)
   3. rename, move column should not affect row identifier because it is 
referencing the field IDs
   4. row identifier should be dropped through 
`UpdateSchema.deleteRowIdentifier(columnName)`
   5. it can only drop existing row identifier column
   6. a row identifier column cannot be dropped unless it is first dropped in 
the row identifiers list, to satisfy both use cases (1) user want to actually 
drop that row identifier column, (2) prevent user from directly dropping that 
column without knowing the implications
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 opened a new pull request #2465: Core: add row identifier to schema

Reply via email to