alamb commented on code in PR #123: URL: https://github.com/apache/parquet-site/pull/123#discussion_r2451683604
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -27,6 +28,11 @@ Implementations: ### Physical types +Physical types are defined by the [`enum Type` parquet.thrift] Review Comment: I purposely chose not to make this a permalink so it always reflected the current version of the thrift file This means over time the link may not point to the exact correct line, but I think people can find the relevant section using `union LogicalType` and we can update the links if needed ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -43,30 +49,43 @@ Implementations: ### Logical types -| Data type | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb | -| ----------------------------------------- | ----- | ------------- | -------- | -------- | ----- | --------- | ------ | -| STRING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| ENUM | ❌ | ✅ | ✅ | ✅ (1) | ❌ | ✅ | ✅ | -| UUID | ❌ | ✅ | ✅ | ✅ (1) | ❌ | ✅ | ✅ | -| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| DECIMAL (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| DECIMAL (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | -| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| DATE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| TIME (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| TIME (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| TIMESTAMP (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| INTERVAL | ✅ | ✅ (1) | ✅ | ✅ | ❌ | ✅ | ✅ | -| JSON | ✅ | ✅ (1) | ✅ | ✅ (1) | ❌ | ✅ | ✅ | -| BSON | ❌ | ✅ (1) | ✅ | ✅ (1) | ❌ | ❌ | ❌ | -| LIST | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ | -| MAP | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ | -| UNKNOWN (always null) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| FLOAT16 | ✅ | ✅ (1) | ✅ | ✅ | ✅ | ✅ | ✅ | +Logical types are defined by the [`union LogicalType` in parquet.thrift] and described in [LogicalTypes.md] + +[`union LogicalType` in parquet.thrift]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L471 +[LogicalTypes.md]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md + +| Data type | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb | +|-----------------------------------------|------| ------- | ------- | ------- | ---- | -------- |--------| +| STRING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| ENUM | ❌ | ✅ | ✅ | ✅ (1) | ❌ | ✅ | ✅ | +| UUID | ❌ | ✅ | ✅ | ✅ (1) | ❌ | ✅ | ✅ | +| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| DECIMAL (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| DECIMAL (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| FLOAT16 | ✅ | ✅ (1) | ✅ | ✅ | ✅ | ✅ | ✅ | +| DATE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| TIME (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| TIME (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| TIMESTAMP (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| INTERVAL | ✅ | ✅ (1) | ✅ | ✅ | ❌ | ✅ | ✅ | +| JSON | ✅ | ✅ (1) | ✅ | ✅ (1) | ❌ | ✅ | ✅ | +| BSON | ❌ | ✅ (1) | ✅ | ✅ (1) | ❌ | ❌ | ❌ | +| [VARIANT] | | | | | | | | Review Comment: this table has three new rows for Variant, Geometry, Geography, and I moved FLOAT16 up with the rest of the numeric types, to be consistent with the list in https://github.com/apache/parquet-format/blob/master/LogicalTypes.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
