jackye1995 commented on pull request #3048:
URL: https://github.com/apache/iceberg/pull/3048#issuecomment-909964634


   @rdblue thanks for the quick review! 
   
   After experimenting the whole night for a few different ways, I think it is 
likely better to
   1. directly map Iceberg partition fields to Glue partition keys, in that 
sense the iceberg partition fields are basically "virtual columns" in Glue
   2. add the full Iceberg field string as the comment of each column, so that 
users can see additional information like optional/required, nested field 
column ID, etc.
   
   Here is an example result stored in Glue:
   
   ```json
   {
       "Table": {
           "Name": "test",
           "DatabaseName": "jack",
           "CreateTime": "2021-08-31T22:56:30-07:00",
           "UpdateTime": "2021-08-31T22:56:30-07:00",
           "Retention": 0,
           "StorageDescriptor": {
               "Columns": [
                   {
                       "Name": "i",
                       "Type": "int",
                       "Comment": "Iceberg column: { 1: i: required int }"
                   },
                   {
                       "Name": "l",
                       "Type": "bigint",
                       "Comment": "Iceberg column: { 2: l: required long }"
                   },
                   {
                       "Name": "d",
                       "Type": "date",
                       "Comment": "Iceberg column: { 3: d: required date }"
                   },
                   {
                       "Name": "t",
                       "Type": "string",
                       "Comment": "Iceberg column: { 4: t: required time }"
                   },
                   {
                       "Name": "ts",
                       "Type": "timestamp",
                       "Comment": "Iceberg column: { 5: ts: required timestamp 
}"
                   },
                   {
                       "Name": "tstz",
                       "Type": "timestamp",
                       "Comment": "Iceberg column: { 6: tstz: required 
timestamptz }"
                   },
                   {
                       "Name": "dec",
                       "Type": "decimal(9,2)",
                       "Comment": "Iceberg column: { 7: dec: required 
decimal(9, 2) }"
                   },
                   {
                       "Name": "s",
                       "Type": "string",
                       "Comment": "Iceberg column: { 8: s: required string }"
                   },
                   {
                       "Name": "u",
                       "Type": "string",
                       "Comment": "Iceberg column: { 9: u: required uuid }"
                   },
                   {
                       "Name": "f",
                       "Type": "binary",
                       "Comment": "Iceberg column: { 10: f: required fixed[3] }"
                   },
                   {
                       "Name": "b",
                       "Type": "binary",
                       "Comment": "Iceberg column: { 11: b: required binary }"
                   },
                   {
                       "Name": "struct",
                       "Type": "struct<i2:int,l2:bigint,d2:date>",
                       "Comment": "Iceberg column: { 12: struct: required 
struct<15: i2: required int, 16: l2: required long, 17: d2: required date> }"
                   },
                   {
                       "Name": "list",
                       "Type": "array<struct<i3:int,l3:bigint,d3:date>>",
                       "Comment": "Iceberg column: { 13: list: required 
list<struct<19: i3: required int, 20: l3: required long, 21: d3: required 
date>> }"
                   },
                   {
                       "Name": "map",
                       "Type": "map<string,struct<i4:int,l5:bigint,d6:date>>",
                       "Comment": "Iceberg column: { 14: map: required 
map<string, struct<24: i4: required int, 25: l5: required long, 26: d6: 
required date>> }"
                   }
               ],
               "Location": "s3://bucket/path",
               "Compressed": false,
               "NumberOfBuckets": 0,
               "SortColumns": [],
               "StoredAsSubDirectories": false
           },
           "PartitionKeys": [
               {
                   "Name": "s_bucket",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1000: s_bucket: 
bucket[16](8)}"
               },
               {
                   "Name": "map.i4_trunc",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1001: map.i4_trunc: 
truncate[2](24)}"
               },
               {
                   "Name": "ts_day",
                   "Type": "date",
                   "Comment": "Iceberg partition field: {1002: ts_day: day(5)}"
               },
               {
                   "Name": "i",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1003: i: identity(1)}"
               },
               {
                   "Name": "struct.i2",
                   "Type": "int",
                   "Comment": "Iceberg partition field: {1004: struct.i2: 
identity(15)}"
               }
           ],
           "TableType": "EXTERNAL_TABLE",
           "Parameters": {
               "metadata_location": "s3://bucket/path",
               "table_type": "ICEBERG"
           }
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to