vdiravka edited a comment on pull request #2254:
URL: https://github.com/apache/drill/pull/2254#issuecomment-860389483


   For me it is some sort of bug in Parquet lib. Anyway looks like there is a 
workaround: you can remove `"  optional int96 _INT96_RAW  ; \n"` from schema 
and then dictionary encoding is used for `_INTERVAL_fixed_len_byte_array_12`, 
which you are interested in.
   
   ```
   vitalii@vitalii-UX331UN:~/IdeaProjects/parquet-mr/parquet-cli$ java -cp 
'target/*:target/dependency/*' org.apache.parquet.cli.Main meta 
/tmp/parquet/drill/parquet_test_file_simple
   WARNING: An illegal reflective access operation has occurred
   WARNING: Illegal reflective access by 
org.apache.hadoop.security.authentication.util.KerberosUtil 
(file:/home/vitalii/IdeaProjects/parquet-mr/parquet-cli/target/dependency/hadoop-auth-2.10.1.jar)
 to method sun.security.krb5.Config.getInstance()
   WARNING: Please consider reporting this to the maintainers of 
org.apache.hadoop.security.authentication.util.KerberosUtil
   WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
   WARNING: All illegal access operations will be denied in a future release
   
   File path:  /tmp/parquet/drill/parquet_test_file_simple
   Created by: parquet-mr version 1.12.0 (build 
db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
   Properties:
     writer.model.name: example
   Schema:
   message ParquetLogicalDataTypes {
     required int32 rowKey;
     required binary _UTF8 (STRING);
     required binary _Enum (ENUM);
     required fixed_len_byte_array(16) _UUID (UUID);
     required int32 _INT32_RAW;
     required int32 _INT_8 (INTEGER(8,true));
     required int32 _INT_16 (INTEGER(16,true));
     required int32 _INT_32 (INTEGER(32,true));
     required int32 _UINT_8 (INTEGER(8,false));
     required int32 _UINT_16 (INTEGER(16,false));
     required int32 _UINT_32 (INTEGER(32,false));
     required int32 _DECIMAL_decimal9 (DECIMAL(9,2));
     required int64 _INT64_RAW;
     required int64 _INT_64 (INTEGER(64,true));
     required int64 _UINT_64 (INTEGER(64,false));
     required int64 _DECIMAL_decimal18 (DECIMAL(18,2));
     required fixed_len_byte_array(20) _DECIMAL_fixed_n (DECIMAL(20,2));
     required binary _DECIMAL_unlimited (DECIMAL(30,2));
     required int32 _DATE_int32 (DATE);
     required int32 _TIME_MILLIS_int32 (TIME(MILLIS,true));
     required int64 _TIMESTAMP_MILLIS_int64 (TIMESTAMP(MILLIS,true));
     required int64 _TIMESTAMP_MICROS_int64 (TIMESTAMP(MICROS,true));
     required fixed_len_byte_array(12) _INTERVAL_fixed_len_byte_array_12 
(INTERVAL);
   }
   
   
   Row group 0:  count: 3  435.00 B records  start: 4  total: 1.274 kB
   
--------------------------------------------------------------------------------
                                      type      encodings count     avg size   
nulls   min / max
   rowKey                             INT32     S   D     3         11.00 B    
0       "1" / "3"
   _UTF8                              BINARY    S   D     3         22.33 B    
0       "UTF8 string1" / "UTF8 string3"
   _Enum                              BINARY    S   D     3         26.33 B    
0       "MAX_VALUE" / "RANDOM_VALUE"
   _UUID                              FIXED[16] S _ R     3         20.67 B  0  
     "01010101-0101-0101-0101-0..." / "01010101-0101-0101-0101-0..."
   _INT32_RAW                         INT32     S   D     3         16.33 B    
0       "-2147483648" / "2147483647"
   _INT_8                             INT32     S   D     3         13.67 B    
0       "-128" / "127"
   _INT_16                            INT32     S   D     3         15.00 B    
0       "-32768" / "32767"
   _INT_32                            INT32     S   D     3         16.33 B    
0       "-2147483648" / "2147483647"
   _UINT_8                            INT32     S   D     3         13.67 B    
0       "0" / "255"
   _UINT_16                           INT32     S   D     3         14.67 B    
0       "0" / "65535"
   _UINT_32                           INT32     S   D     3         17.33 B    
0       "0" / "4294967295"
   _DECIMAL_decimal9                  INT32     S   D     3         17.33 B    
0       "-0.01" / "12345.67"
   _INT64_RAW                         INT64     S   D     3         21.00 B    
0       "-9223372036854775808" / "9223372036854775807"
   _INT_64                            INT64     S   D     3         21.00 B    
0       "-9223372036854775808" / "9223372036854775807"
   _UINT_64                           INT64     S   D     3         21.33 B    
0       "0" / "18446744073709551615"
   _DECIMAL_decimal18                 INT64     S   D     3         21.33 B    
0       "-0.01" / "12345678901234.56"
   _DECIMAL_fixed_n                   FIXED[20] S _ R     3         23.00 B  0  
     "0.00" / "2808600455222908552998455..."
   _DECIMAL_unlimited                 BINARY    S   D     3         20.33 B    
0       "0.00" / "3395389607300375329868809..."
   _DATE_int32                        INT32     S   D     3         17.33 B    
0       "1969-12-31" / "5350-02-17"
   _TIME_MILLIS_int32                 INT32     S   D     3         17.33 B    
0       "00:00:00.001+0000" / "00:20:34.567+0000"
   _TIMESTAMP_MILLIS_int64            INT64     S   D     3         20.33 B    
0       "1970-01-01T00:00:00.000+0000" / "2038-01-19T03:14:07.999+0000"
   _TIMESTAMP_MICROS_int64            INT64     S   D     3         22.00 B    
0       "1970-01-01T00:00:00.00000..." / "+294247-01-10T04:00:54.77..."
   _INTERVAL_fixed_len_byte_array_12  FIXED[12] S _ R     3         25.33 B  0  
   ```
   where `R` means `RLE_DICTIONARY` or  `PLAIN_DICTIONARY`.
   
   Initially there was a following meta for this file for me:
   ```
   vitalii@vitalii-UX331UN:~/IdeaProjects/parquet-mr/parquet-cli$ java -cp 
'target/*:target/dependency/*' org.apache.parquet.cli.Main meta 
/tmp/parquet/drill/parquet_test_file_simple
   WARNING: An illegal reflective access operation has occurred
   WARNING: Illegal reflective access by 
org.apache.hadoop.security.authentication.util.KerberosUtil 
(file:/home/vitalii/IdeaProjects/parquet-mr/parquet-cli/target/dependency/hadoop-auth-2.10.1.jar)
 to method sun.security.krb5.Config.getInstance()
   WARNING: Please consider reporting this to the maintainers of 
org.apache.hadoop.security.authentication.util.KerberosUtil
   WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
   WARNING: All illegal access operations will be denied in a future release
   
   File path:  /tmp/parquet/drill/parquet_test_file_simple
   Created by: parquet-mr version 1.12.0 (build 
db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
   Properties:
     writer.model.name: example
   Schema:
   message ParquetLogicalDataTypes {
     required int32 rowKey;
     required binary _UTF8 (STRING);
     required binary _Enum (ENUM);
     required fixed_len_byte_array(16) _UUID (UUID);
     required int32 _INT32_RAW;
     required int32 _INT_8 (INTEGER(8,true));
     required int32 _INT_16 (INTEGER(16,true));
     required int32 _INT_32 (INTEGER(32,true));
     required int32 _UINT_8 (INTEGER(8,false));
     required int32 _UINT_16 (INTEGER(16,false));
     required int32 _UINT_32 (INTEGER(32,false));
     required int32 _DECIMAL_decimal9 (DECIMAL(9,2));
     required int64 _INT64_RAW;
     required int64 _INT_64 (INTEGER(64,true));
     required int64 _UINT_64 (INTEGER(64,false));
     required int64 _DECIMAL_decimal18 (DECIMAL(18,2));
     required fixed_len_byte_array(20) _DECIMAL_fixed_n (DECIMAL(20,2));
     required binary _DECIMAL_unlimited (DECIMAL(30,2));
     required int32 _DATE_int32 (DATE);
     required int32 _TIME_MILLIS_int32 (TIME(MILLIS,true));
     required int64 _TIMESTAMP_MILLIS_int64 (TIMESTAMP(MILLIS,true));
     required int64 _TIMESTAMP_MICROS_int64 (TIMESTAMP(MICROS,true));
     required fixed_len_byte_array(12) _INTERVAL_fixed_len_byte_array_12 
(INTERVAL);
     required int96 _INT96_RAW;
   }
   
   
   Row group 0:  count: 3  361.00 B records  start: 4  total: 1.058 kB
   
--------------------------------------------------------------------------------
                                      type      encodings count     avg size   
nulls   min / max
   rowKey                             INT32     S   _     3         12.33 B    
0       "1" / "3"
   _UTF8                              BINARY    S   _     3         20.67 B    
0       "UTF8 string1" / "UTF8 string3"
   _Enum                              BINARY    S   _     3         19.67 B    
0       "MAX_VALUE" / "RANDOM_VALUE"
   _UUID                              FIXED[16] S   _     3         9.67 B   0  
     "01010101-0101-0101-0101-0..." / "01010101-0101-0101-0101-0..."
   _INT32_RAW                         INT32     S   _     3         12.33 B    
0       "-2147483648" / "2147483647"
   _INT_8                             INT32     S   _     3         12.33 B    
0       "-128" / "127"
   _INT_16                            INT32     S   _     3         12.33 B    
0       "-32768" / "32767"
   _INT_32                            INT32     S   _     3         12.33 B    
0       "-2147483648" / "2147483647"
   _UINT_8                            INT32     S   _     3         12.33 B    
0       "0" / "255"
   _UINT_16                           INT32     S   _     3         12.33 B    
0       "0" / "65535"
   _UINT_32                           INT32     S   _     3         12.33 B    
0       "0" / "4294967295"
   _DECIMAL_decimal9                  INT32     S   _     3         12.33 B    
0       "-0.01" / "12345.67"
   _INT64_RAW                         INT64     S   _     3         16.33 B    
0       "-9223372036854775808" / "9223372036854775807"
   _INT_64                            INT64     S   _     3         16.33 B    
0       "-9223372036854775808" / "9223372036854775807"
   _UINT_64                           INT64     S   _     3         16.33 B    
0       "0" / "18446744073709551615"
   _DECIMAL_decimal18                 INT64     S   _     3         16.33 B    
0       "-0.01" / "12345678901234.56"
   _DECIMAL_fixed_n                   FIXED[20] S   _     3         15.33 B  0  
     "0.00" / "2808600455222908552998455..."
   _DECIMAL_unlimited                 BINARY    S   _     3         18.33 B    
0       "0.00" / "3395389607300375329868809..."
   _DATE_int32                        INT32     S   _     3         12.33 B    
0       "1969-12-31" / "5350-02-17"
   _TIME_MILLIS_int32                 INT32     S   _     3         12.33 B    
0       "00:00:00.001+0000" / "00:20:34.567+0000"
   _TIMESTAMP_MILLIS_int64            INT64     S   _     3         16.33 B    
0       "1970-01-01T00:00:00.000+0000" / "2038-01-19T03:14:07.999+0000"
   _TIMESTAMP_MICROS_int64            INT64     S   _     3         16.33 B    
0       "1970-01-01T00:00:00.00000..." / "+294247-01-10T04:00:54.77..."
   _INTERVAL_fixed_len_byte_array_12  FIXED[12] S   _     3         17.67 B  0  
     
   _INT96_RAW                         INT96     S _ R     3         26.00 B    
0  
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to