jackye1995 commented on pull request #3048:
URL: https://github.com/apache/iceberg/pull/3048#issuecomment-909964634
@rdblue thanks for the quick review!
After experimenting the whole night for a few different ways, I think it is
likely better to
1. directly map Iceberg partition fields to Glue partition keys, in that
sense the iceberg partition fields are basically "virtual columns" in Glue
2. add the full Iceberg field string as the comment of each column, so that
users can see additional information like optional/required, nested field
column ID, etc.
Here is an example result stored in Glue:
```json
{
"Table": {
"Name": "test",
"DatabaseName": "jack",
"CreateTime": "2021-08-31T22:56:30-07:00",
"UpdateTime": "2021-08-31T22:56:30-07:00",
"Retention": 0,
"StorageDescriptor": {
"Columns": [
{
"Name": "i",
"Type": "int",
"Comment": "Iceberg column: { 1: i: required int }"
},
{
"Name": "l",
"Type": "bigint",
"Comment": "Iceberg column: { 2: l: required long }"
},
{
"Name": "d",
"Type": "date",
"Comment": "Iceberg column: { 3: d: required date }"
},
{
"Name": "t",
"Type": "string",
"Comment": "Iceberg column: { 4: t: required time }"
},
{
"Name": "ts",
"Type": "timestamp",
"Comment": "Iceberg column: { 5: ts: required timestamp
}"
},
{
"Name": "tstz",
"Type": "timestamp",
"Comment": "Iceberg column: { 6: tstz: required
timestamptz }"
},
{
"Name": "dec",
"Type": "decimal(9,2)",
"Comment": "Iceberg column: { 7: dec: required
decimal(9, 2) }"
},
{
"Name": "s",
"Type": "string",
"Comment": "Iceberg column: { 8: s: required string }"
},
{
"Name": "u",
"Type": "string",
"Comment": "Iceberg column: { 9: u: required uuid }"
},
{
"Name": "f",
"Type": "binary",
"Comment": "Iceberg column: { 10: f: required fixed[3] }"
},
{
"Name": "b",
"Type": "binary",
"Comment": "Iceberg column: { 11: b: required binary }"
},
{
"Name": "struct",
"Type": "struct<i2:int,l2:bigint,d2:date>",
"Comment": "Iceberg column: { 12: struct: required
struct<15: i2: required int, 16: l2: required long, 17: d2: required date> }"
},
{
"Name": "list",
"Type": "array<struct<i3:int,l3:bigint,d3:date>>",
"Comment": "Iceberg column: { 13: list: required
list<struct<19: i3: required int, 20: l3: required long, 21: d3: required
date>> }"
},
{
"Name": "map",
"Type": "map<string,struct<i4:int,l5:bigint,d6:date>>",
"Comment": "Iceberg column: { 14: map: required
map<string, struct<24: i4: required int, 25: l5: required long, 26: d6:
required date>> }"
}
],
"Location": "s3://bucket/path",
"Compressed": false,
"NumberOfBuckets": 0,
"SortColumns": [],
"StoredAsSubDirectories": false
},
"PartitionKeys": [
{
"Name": "s_bucket",
"Type": "int",
"Comment": "Iceberg partition field: {1000: s_bucket:
bucket[16](8)}"
},
{
"Name": "map.i4_trunc",
"Type": "int",
"Comment": "Iceberg partition field: {1001: map.i4_trunc:
truncate[2](24)}"
},
{
"Name": "ts_day",
"Type": "date",
"Comment": "Iceberg partition field: {1002: ts_day: day(5)}"
},
{
"Name": "i",
"Type": "int",
"Comment": "Iceberg partition field: {1003: i: identity(1)}"
},
{
"Name": "struct.i2",
"Type": "int",
"Comment": "Iceberg partition field: {1004: struct.i2:
identity(15)}"
}
],
"TableType": "EXTERNAL_TABLE",
"Parameters": {
"metadata_location": "s3://bucket/path",
"table_type": "ICEBERG"
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]