kbendick commented on issue #2962:
URL: https://github.com/apache/iceberg/issues/2962#issuecomment-925389129
I looked at the parquet meta of one of the underlying files from both
Iceberg 0.9.0 and Iceberg 0.10.0.
The difference does seem to be in the change in the MAP type definition.
```
File path: file_iceberg_010.parquet
Created by: parquet-mr version 1.11.1 (build
765bd5cd7fdef2af1cecd0755000694b992bfadd)
Properties:
iceberg.schema:
{"type":"struct","fields":[{"id":1,"name":"mapCol","required":true,"type":{"type":"map","key-id":2,"key":"string","value-id":3,"value":{"type":"struct","fields":[{"id":4,"name":"payload","required":true,"type":{"type":"struct","fields":[{"id":6,"name":"bool","required":true,"type":"boolean"},{"id":7,"name":"dbl","required":true,"type":"double"},{"id":8,"name":"str","required":true,"type":"string"}]}},{"id":5,"name":"str","required":true,"type":"string"}]},"value-required":true}}]}
Schema:
message table {
required group mapCol (MAP) = 1 {
repeated group key_value {
required binary key (STRING) = 2;
required group value = 3 {
required group payload = 4 {
required boolean bool = 6;
required double dbl = 7;
required binary str (STRING) = 8;
}
required binary str (STRING) = 5;
}
}
}
}
Row group 0: count: 12 32.33 B records start: 4 total: 388 B
--------------------------------------------------------------------------------
type encodings count avg size
nulls min / max
mapCol.key_value.key BINARY G _ 12 6.92 B
0 "0" / "9"
mapCol.key_value.value.payload.bool BOOLEAN G _ 12 4.50 B
0 "false" / "true"
mapCol.key_value.value.payload.dbl DOUBLE G _ 12 7.08 B
0 "-0.0" / "11.0"
mapCol.key_value.value.payload.str BINARY G _ 12 6.92 B
0 "0" / "9"
mapCol.key_value.value.str BINARY G _ 12 6.92 B
0 "0" / "9"
```
Here's the meta from one of the files in the Iceberg 0.9.0 table
```
File path: file_090.parquet
Created by: parquet-mr version 1.11.0 (build
18519eb8e059865652eee3ff0e8593f126701da4)
Properties:
iceberg.schema:
{"type":"struct","fields":[{"id":1,"name":"mapCol","required":true,"type":{"type":"map","key-id":2,"key":"string","value-id":3,"value":{"type":"struct","fields":[{"id":4,"name":"payload","required":true,"type":{"type":"struct","fields":[{"id":6,"name":"bool","required":true,"type":"boolean"},{"id":7,"name":"dbl","required":true,"type":"double"},{"id":8,"name":"str","required":true,"type":"string"}]}},{"id":5,"name":"str","required":true,"type":"string"}]},"value-required":true}}]}
Schema:
message table {
required group mapCol (MAP) = 1 {
repeated group map {
required binary key (STRING) = 2;
required group value = 3 {
required group payload = 4 {
required boolean bool = 6;
required double dbl = 7;
required binary str (STRING) = 8;
}
required binary str (STRING) = 5;
}
}
}
}
Row group 0: count: 12 32.33 B records start: 4 total: 388 B
--------------------------------------------------------------------------------
type encodings count avg size
nulls min / max
mapCol.map.key BINARY G _ 12 6.92 B 0
"0" / "9"
mapCol.map.value.payload.bool BOOLEAN G _ 12 4.50 B 0
"false" / "true"
mapCol.map.value.payload.dbl DOUBLE G _ 12 7.08 B 0
"-0.0" / "11.0"
mapCol.map.value.payload.str BINARY G _ 12 6.92 B 0
"0" / "9"
mapCol.map.value.str BINARY G _ 12 6.92 B 0
"0" / "9"
```
However, as mentioned, I _am_ able to read both tables with Iceberg 0.12.0.
@hankfanchiu You said your internal fork doesn't work with Iceberg 0.12.0? Can
you possibly try with OSS?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]