This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch production
in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/production by this push:
new 2f7d238 Add Variant, Geometry, Geography types to implementation
status page (#123)
2f7d238 is described below
commit 2f7d238ff5eda628c33079679c38709929d5aa8a
Author: Andrew Lamb <[email protected]>
AuthorDate: Sun Nov 2 06:45:16 2025 -0500
Add Variant, Geometry, Geography types to implementation status page (#123)
* Add Variant, Geometry, Geography types to implementation status page
* Add more links
* Apply suggestions from code review
Co-authored-by: Sylvain Lesage <[email protected]>
* Note arrow-go and arrow-rs now support Variant
* Note that hyparquet.js supports geo, not variant
* Update duckdb to note it supports Variant/Ge
* check GEOMETRY and GEOGRAPHY for cpp and java
* Note update arrow-go and cudf do not support geo
---------
Co-authored-by: Sylvain Lesage <[email protected]>
---
.../en/docs/File Format/implementationstatus.md | 65 ++++++++++++++--------
1 file changed, 42 insertions(+), 23 deletions(-)
diff --git a/content/en/docs/File Format/implementationstatus.md
b/content/en/docs/File Format/implementationstatus.md
index 36b07cd..110679e 100644
--- a/content/en/docs/File Format/implementationstatus.md
+++ b/content/en/docs/File Format/implementationstatus.md
@@ -7,14 +7,15 @@ weight: 8
This page summarizes the features supported by different Parquet
implementations.
-*Note*: This is a work in progress and we would welcome help expanding its
scope.
+*Note*: If you find out of date information, please help us improve the
accuracy
+of this page by opening an issue or submitting a pull request.
### Legend
The value in each box means:
* ✅: supported
* ❌: not supported
* (R/W): partial reader/writer only support
-* (blank) no data
+* (blank): no data
Implementations:
* [arrow](https://github.com/apache/arrow/tree/main/cpp/src/parquet) (C++)
@@ -27,6 +28,11 @@ Implementations:
### Physical types
+Physical types are defined by the [`enum Type` in parquet.thrift]
+
+[`enum Type` in parquet.thrift]:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L32
+
+
| Data type | arrow | parquet-java | arrow-go
| arrow-rs | cudf | hyparquet | duckdb |
| ----------------------------------------- | ----- | ------------- | --------
| -------- | ----- | --------- | ------ |
| BOOLEAN | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
@@ -43,30 +49,43 @@ Implementations:
### Logical types
-| Data type | arrow | parquet-java | arrow-go
| arrow-rs | cudf | hyparquet | duckdb |
-| ----------------------------------------- | ----- | ------------- | --------
| -------- | ----- | --------- | ------ |
-| STRING | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| ENUM | ❌ | ✅ | ✅ |
✅ (1) | ❌ | ✅ | ✅ |
-| UUID | ❌ | ✅ | ✅ |
✅ (1) | ❌ | ✅ | ✅ |
-| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| DECIMAL (INT32) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| DECIMAL (INT64) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | (R) |
-| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| DATE | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| TIME (INT32) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| TIME (INT64) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| TIMESTAMP (INT64) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| INTERVAL | ✅ | ✅ (1) | ✅ |
✅ | ❌ | ✅ | ✅ |
-| JSON | ✅ | ✅ (1) | ✅ |
✅ (1) | ❌ | ✅ | ✅ |
-| BSON | ❌ | ✅ (1) | ✅ |
✅ (1) | ❌ | ❌ | ❌ |
-| LIST | ✅ | ✅ | ✅ |
✅ | ✅ | (R) | ✅ |
-| MAP | ✅ | ✅ | ✅ |
✅ | ✅ | (R) | ✅ |
-| UNKNOWN (always null) | ✅ | ✅ | ✅ |
✅ | ✅ | ✅ | ✅ |
-| FLOAT16 | ✅ | ✅ (1) | ✅ |
✅ | ✅ | ✅ | ✅ |
+Logical types are defined by the [`union LogicalType` in parquet.thrift] and
described in [LogicalTypes.md]
+
+[`union LogicalType` in parquet.thrift]:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L471
+[LogicalTypes.md]:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
+
+| Data type | arrow | parquet-java | arrow-go |
arrow-rs | cudf | hyparquet | duckdb |
+|-----------------------------------------|------| ------------- | ------- |
--------- | ---- | -------- |--------|
+| STRING | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| ENUM | ❌ | ✅ | ✅ | ✅
(1) | ❌ | ✅ | ✅ |
+| UUID | ❌ | ✅ | ✅ | ✅
(1) | ❌ | ✅ | ✅ |
+| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| DECIMAL (INT32) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| DECIMAL (INT64) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | (R) |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| FLOAT16 | ✅ | ✅ (1) | ✅ | ✅
| ✅ | ✅ | ✅ |
+| DATE | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| TIME (INT32) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| TIME (INT64) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| TIMESTAMP (INT64) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
+| INTERVAL | ✅ | ✅ (1) | ✅ | ✅
| ❌ | ✅ | ✅ |
+| JSON | ✅ | ✅ (1) | ✅ | ✅
(1) | ❌ | ✅ | ✅ |
+| BSON | ❌ | ✅ (1) | ✅ | ✅
(1) | ❌ | ❌ | ❌ |
+| [VARIANT] | | ✅ | ✅ | ✅
| ❌ | ❌ | ✅ |
+| [GEOMETRY] | ✅ | ✅ | ❌ | ✅
| ❌ | ✅ | ✅ |
+| [GEOGRAPHY] | ✅ | ✅ | ❌ | ✅
| ❌ | ✅ | ✅ |
+| LIST | ✅ | ✅ | ✅ | ✅
| ✅ | (R) | ✅ |
+| MAP | ✅ | ✅ | ✅ | ✅
| ✅ | (R) | ✅ |
+| UNKNOWN (always null) | ✅ | ✅ | ✅ | ✅
| ✅ | ✅ | ✅ |
* \(1) Only supported to use its annotated physical type
+[VARIANT]:
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
+[GEOMETRY]:
https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types
+[GEOGRAPHY]:
https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types
+
+
### Encodings
Encodings are defined by the [`enum Encoding` in parquet.thrift] and described
in [Encodings.md]