wjones127 commented on code in PR #33694:
URL: https://github.com/apache/arrow/pull/33694#discussion_r1081960529
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
+ ///
+ /// According to the specs, DECIMAL can be used to annotate the following
types:
Review Comment:
```suggestion
/// In Parquet, DECIMAL can be stored in any of the following physical
types:
```
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
+ ///
+ /// According to the specs, DECIMAL can be used to annotate the following
types:
+ /// - int32: for 1 <= precision <= 9.
+ /// - int64: for 1 <= precision <= 18; precision < 10 will produce a
warning.
+ /// - fixed_len_byte_array: precision is limited by the array size.
+ /// Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+ /// - binary: precision is not limited, but is required. precision is not
limited,
+ /// but is required. The minimum number of bytes to store the unscaled
value
+ /// should be used.
Review Comment:
```suggestion
/// - binary: precision is unlimited. The minimum number of bytes to
store the unscaled value
/// is used.
```
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
Review Comment:
We can omit `1 <= precision` since that's obvious.
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
+ ///
+ /// According to the specs, DECIMAL can be used to annotate the following
types:
+ /// - int32: for 1 <= precision <= 9.
+ /// - int64: for 1 <= precision <= 18; precision < 10 will produce a
warning.
Review Comment:
"precision < 10 will produce a warning." is this true of our implementation?
Maybe we should skip saying that.
##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}),
{array});
parquet::WriterProperties::Builder builder;
- // Enforce integer type to annotate decimal type
- auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+ // Enforce integer type to store short decimal type
+ auto writer_properties =
builder.enable_short_decimal_stored_as_integer()->build();
Review Comment:
I think this slightly shorter name is sufficient.
##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}),
{array});
parquet::WriterProperties::Builder builder;
- // Enforce integer type to annotate decimal type
- auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+ // Enforce integer type to store short decimal type
Review Comment:
```suggestion
// Allow small decimals to be stored as int32 or int64.
```
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
Review Comment:
```suggestion
/// Allow decimals with precision <= 18 to be stored as integers.
```
##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -4821,8 +4821,8 @@ class TestIntegerAnnotateDecimalTypeParquetIO : public
TestParquetIO<TestType> {
auto arrow_schema = ::arrow::schema({::arrow::field("a", values->type())});
parquet::WriterProperties::Builder builder;
- // Enforce integer type to annotate decimal type
- auto writer_properties =
builder.enable_integer_annotate_decimal()->build();
+ // Enforce integer type to store short decimal type
Review Comment:
```suggestion
// Allow small decimals to be stored as int32 or int64.
```
##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}),
{array});
parquet::WriterProperties::Builder builder;
- // Enforce integer type to annotate decimal type
- auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+ // Enforce integer type to store short decimal type
+ auto writer_properties =
builder.enable_short_decimal_stored_as_integer()->build();
Review Comment:
```suggestion
auto writer_properties =
builder.enable_store_decimal_as_integer()->build();
```
##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
return this->disable_statistics(path->ToDotString());
}
- /// Enable integer type to annotate decimal type as below:
- /// int32: 1 <= precision <= 9
- /// int64: 10 <= precision <= 18
- /// Default disabled.
- Builder* enable_integer_annotate_decimal() {
- integer_annotate_decimal_ = true;
+ /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+ /// integer physical type.
+ ///
+ /// According to the specs, DECIMAL can be used to annotate the following
types:
+ /// - int32: for 1 <= precision <= 9.
+ /// - int64: for 1 <= precision <= 18; precision < 10 will produce a
warning.
+ /// - fixed_len_byte_array: precision is limited by the array size.
+ /// Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+ /// - binary: precision is not limited, but is required. precision is not
limited,
+ /// but is required. The minimum number of bytes to store the unscaled
value
+ /// should be used.
+ ///
+ /// By default, this is DISABLED and all decimal types annotate
fixed_len_byte_array.
+ ///
+ /// When enabled, the C++ writer will use following physical types to
store decimals:
+ /// - int32: for 1 <= precision <= 9.
+ /// - int64: for 10 <= precision <= 18.
+ /// - fixed_len_byte_array: for precision > 18.
+ ///
+ /// As a consequence, decimal columns stored in integer types are more
compact
+ /// but in a risk that the parquet file may not be readable by previous
Arrow C++
+ /// versions or other implementations.
Review Comment:
```suggestion
/// As a consequence, decimal columns stored in integer types are more
compact.
```
I'm not sure it's worth saying that older versions don't support them. In
Arrow C++, read support for these was added at the same time for the other
physical types for decimals ([original
pr](https://github.com/apache/parquet-cpp/pull/403)). And that was a long time
ago, before version 0.10.0. I'm sure it was similarly well supported in
parquet-mr.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]