wjones127 commented on code in PR #33694:
URL: https://github.com/apache/arrow/pull/33694#discussion_r1081960529


##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.
+    ///
+    /// According to the specs, DECIMAL can be used to annotate the following 
types:

Review Comment:
   ```suggestion
       /// In Parquet, DECIMAL can be stored in any of the following physical 
types:
   ```



##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.
+    ///
+    /// According to the specs, DECIMAL can be used to annotate the following 
types:
+    /// - int32: for 1 <= precision <= 9.
+    /// - int64: for 1 <= precision <= 18; precision < 10 will produce a 
warning.
+    /// - fixed_len_byte_array: precision is limited by the array size.
+    ///   Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+    /// - binary: precision is not limited, but is required. precision is not 
limited,
+    ///   but is required. The minimum number of bytes to store the unscaled 
value
+    ///   should be used.

Review Comment:
   ```suggestion
       /// - binary: precision is unlimited. The minimum number of bytes to 
store the unscaled value
       ///   is used.
   ```



##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.

Review Comment:
   We can omit `1 <= precision` since that's obvious.



##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.
+    ///
+    /// According to the specs, DECIMAL can be used to annotate the following 
types:
+    /// - int32: for 1 <= precision <= 9.
+    /// - int64: for 1 <= precision <= 18; precision < 10 will produce a 
warning.

Review Comment:
   "precision < 10 will produce a warning." is this true of our implementation? 
Maybe we should skip saying that.



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
   auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), 
{array});
 
   parquet::WriterProperties::Builder builder;
-  // Enforce integer type to annotate decimal type
-  auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+  // Enforce integer type to store short decimal type
+  auto writer_properties = 
builder.enable_short_decimal_stored_as_integer()->build();

Review Comment:
   I think this slightly shorter name is sufficient.



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
   auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), 
{array});
 
   parquet::WriterProperties::Builder builder;
-  // Enforce integer type to annotate decimal type
-  auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+  // Enforce integer type to store short decimal type

Review Comment:
   ```suggestion
     // Allow small decimals to be stored as int32 or int64.
   ```



##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.

Review Comment:
   ```suggestion
       /// Allow decimals with precision <= 18 to be stored as integers.
   ```



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -4821,8 +4821,8 @@ class TestIntegerAnnotateDecimalTypeParquetIO : public 
TestParquetIO<TestType> {
     auto arrow_schema = ::arrow::schema({::arrow::field("a", values->type())});
 
     parquet::WriterProperties::Builder builder;
-    // Enforce integer type to annotate decimal type
-    auto writer_properties = 
builder.enable_integer_annotate_decimal()->build();
+    // Enforce integer type to store short decimal type

Review Comment:
   ```suggestion
       // Allow small decimals to be stored as int32 or int64.
   ```



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3443,8 +3443,8 @@ TEST(ArrowReadWrite, Decimal256AsInt) {
   auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), 
{array});
 
   parquet::WriterProperties::Builder builder;
-  // Enforce integer type to annotate decimal type
-  auto writer_properties = builder.enable_integer_annotate_decimal()->build();
+  // Enforce integer type to store short decimal type
+  auto writer_properties = 
builder.enable_short_decimal_stored_as_integer()->build();

Review Comment:
   ```suggestion
     auto writer_properties = 
builder.enable_store_decimal_as_integer()->build();
   ```



##########
cpp/src/parquet/properties.h:
##########
@@ -452,19 +452,39 @@ class PARQUET_EXPORT WriterProperties {
       return this->disable_statistics(path->ToDotString());
     }
 
-    /// Enable integer type to annotate decimal type as below:
-    ///   int32: 1 <= precision <= 9
-    ///   int64: 10 <= precision <= 18
-    /// Default disabled.
-    Builder* enable_integer_annotate_decimal() {
-      integer_annotate_decimal_ = true;
+    /// Enable decimal logical type with 1 <= precision <= 18 to be stored as
+    /// integer physical type.
+    ///
+    /// According to the specs, DECIMAL can be used to annotate the following 
types:
+    /// - int32: for 1 <= precision <= 9.
+    /// - int64: for 1 <= precision <= 18; precision < 10 will produce a 
warning.
+    /// - fixed_len_byte_array: precision is limited by the array size.
+    ///   Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits.
+    /// - binary: precision is not limited, but is required. precision is not 
limited,
+    ///   but is required. The minimum number of bytes to store the unscaled 
value
+    ///   should be used.
+    ///
+    /// By default, this is DISABLED and all decimal types annotate 
fixed_len_byte_array.
+    ///
+    /// When enabled, the C++ writer will use following physical types to 
store decimals:
+    /// - int32: for 1 <= precision <= 9.
+    /// - int64: for 10 <= precision <= 18.
+    /// - fixed_len_byte_array: for precision > 18.
+    ///
+    /// As a consequence, decimal columns stored in integer types are more 
compact
+    /// but in a risk that the parquet file may not be readable by previous 
Arrow C++
+    /// versions or other implementations.

Review Comment:
   ```suggestion
       /// As a consequence, decimal columns stored in integer types are more 
compact.
   ```
   
   I'm not sure it's worth saying that older versions don't support them. In 
Arrow C++, read support for these was added at the same time for the other 
physical types for decimals ([original 
pr](https://github.com/apache/parquet-cpp/pull/403)). And that was a long time 
ago, before version 0.10.0. I'm sure it was similarly well supported in 
parquet-mr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to