vdiravka edited a comment on pull request #2254:
URL: https://github.com/apache/drill/pull/2254#issuecomment-860389483
For me it is some sort of bug in Parquet lib. Anyway looks like there is a
workaround: you can remove `" optional int96 _INT96_RAW ; \n"` from schema
and then dictionary encoding is used for `_INTERVAL_fixed_len_byte_array_12`,
which you are interested in.
```
vitalii@vitalii-UX331UN:~/IdeaProjects/parquet-mr/parquet-cli$ java -cp
'target/*:target/dependency/*' org.apache.parquet.cli.Main meta
/tmp/parquet/drill/parquet_test_file_simple
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/vitalii/IdeaProjects/parquet-mr/parquet-cli/target/dependency/hadoop-auth-2.10.1.jar)
to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
File path: /tmp/parquet/drill/parquet_test_file_simple
Created by: parquet-mr version 1.12.0 (build
db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
Properties:
writer.model.name: example
Schema:
message ParquetLogicalDataTypes {
required int32 rowKey;
required binary _UTF8 (STRING);
required binary _Enum (ENUM);
required fixed_len_byte_array(16) _UUID (UUID);
required int32 _INT32_RAW;
required int32 _INT_8 (INTEGER(8,true));
required int32 _INT_16 (INTEGER(16,true));
required int32 _INT_32 (INTEGER(32,true));
required int32 _UINT_8 (INTEGER(8,false));
required int32 _UINT_16 (INTEGER(16,false));
required int32 _UINT_32 (INTEGER(32,false));
required int32 _DECIMAL_decimal9 (DECIMAL(9,2));
required int64 _INT64_RAW;
required int64 _INT_64 (INTEGER(64,true));
required int64 _UINT_64 (INTEGER(64,false));
required int64 _DECIMAL_decimal18 (DECIMAL(18,2));
required fixed_len_byte_array(20) _DECIMAL_fixed_n (DECIMAL(20,2));
required binary _DECIMAL_unlimited (DECIMAL(30,2));
required int32 _DATE_int32 (DATE);
required int32 _TIME_MILLIS_int32 (TIME(MILLIS,true));
required int64 _TIMESTAMP_MILLIS_int64 (TIMESTAMP(MILLIS,true));
required int64 _TIMESTAMP_MICROS_int64 (TIMESTAMP(MICROS,true));
required fixed_len_byte_array(12) _INTERVAL_fixed_len_byte_array_12
(INTERVAL);
}
Row group 0: count: 3 435.00 B records start: 4 total: 1.274 kB
--------------------------------------------------------------------------------
type encodings count avg size
nulls min / max
rowKey INT32 S D 3 11.00 B
0 "1" / "3"
_UTF8 BINARY S D 3 22.33 B
0 "UTF8 string1" / "UTF8 string3"
_Enum BINARY S D 3 26.33 B
0 "MAX_VALUE" / "RANDOM_VALUE"
_UUID FIXED[16] S _ R 3 20.67 B 0
"01010101-0101-0101-0101-0..." / "01010101-0101-0101-0101-0..."
_INT32_RAW INT32 S D 3 16.33 B
0 "-2147483648" / "2147483647"
_INT_8 INT32 S D 3 13.67 B
0 "-128" / "127"
_INT_16 INT32 S D 3 15.00 B
0 "-32768" / "32767"
_INT_32 INT32 S D 3 16.33 B
0 "-2147483648" / "2147483647"
_UINT_8 INT32 S D 3 13.67 B
0 "0" / "255"
_UINT_16 INT32 S D 3 14.67 B
0 "0" / "65535"
_UINT_32 INT32 S D 3 17.33 B
0 "0" / "4294967295"
_DECIMAL_decimal9 INT32 S D 3 17.33 B
0 "-0.01" / "12345.67"
_INT64_RAW INT64 S D 3 21.00 B
0 "-9223372036854775808" / "9223372036854775807"
_INT_64 INT64 S D 3 21.00 B
0 "-9223372036854775808" / "9223372036854775807"
_UINT_64 INT64 S D 3 21.33 B
0 "0" / "18446744073709551615"
_DECIMAL_decimal18 INT64 S D 3 21.33 B
0 "-0.01" / "12345678901234.56"
_DECIMAL_fixed_n FIXED[20] S _ R 3 23.00 B 0
"0.00" / "2808600455222908552998455..."
_DECIMAL_unlimited BINARY S D 3 20.33 B
0 "0.00" / "3395389607300375329868809..."
_DATE_int32 INT32 S D 3 17.33 B
0 "1969-12-31" / "5350-02-17"
_TIME_MILLIS_int32 INT32 S D 3 17.33 B
0 "00:00:00.001+0000" / "00:20:34.567+0000"
_TIMESTAMP_MILLIS_int64 INT64 S D 3 20.33 B
0 "1970-01-01T00:00:00.000+0000" / "2038-01-19T03:14:07.999+0000"
_TIMESTAMP_MICROS_int64 INT64 S D 3 22.00 B
0 "1970-01-01T00:00:00.00000..." / "+294247-01-10T04:00:54.77..."
_INTERVAL_fixed_len_byte_array_12 FIXED[12] S _ R 3 25.33 B 0
```
where `R` means `RLE_DICTIONARY` or `PLAIN_DICTIONARY`.
Initially there was a following meta for this file for me:
```
vitalii@vitalii-UX331UN:~/IdeaProjects/parquet-mr/parquet-cli$ java -cp
'target/*:target/dependency/*' org.apache.parquet.cli.Main meta
/tmp/parquet/drill/parquet_test_file_simple
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/vitalii/IdeaProjects/parquet-mr/parquet-cli/target/dependency/hadoop-auth-2.10.1.jar)
to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
File path: /tmp/parquet/drill/parquet_test_file_simple
Created by: parquet-mr version 1.12.0 (build
db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
Properties:
writer.model.name: example
Schema:
message ParquetLogicalDataTypes {
required int32 rowKey;
required binary _UTF8 (STRING);
required binary _Enum (ENUM);
required fixed_len_byte_array(16) _UUID (UUID);
required int32 _INT32_RAW;
required int32 _INT_8 (INTEGER(8,true));
required int32 _INT_16 (INTEGER(16,true));
required int32 _INT_32 (INTEGER(32,true));
required int32 _UINT_8 (INTEGER(8,false));
required int32 _UINT_16 (INTEGER(16,false));
required int32 _UINT_32 (INTEGER(32,false));
required int32 _DECIMAL_decimal9 (DECIMAL(9,2));
required int64 _INT64_RAW;
required int64 _INT_64 (INTEGER(64,true));
required int64 _UINT_64 (INTEGER(64,false));
required int64 _DECIMAL_decimal18 (DECIMAL(18,2));
required fixed_len_byte_array(20) _DECIMAL_fixed_n (DECIMAL(20,2));
required binary _DECIMAL_unlimited (DECIMAL(30,2));
required int32 _DATE_int32 (DATE);
required int32 _TIME_MILLIS_int32 (TIME(MILLIS,true));
required int64 _TIMESTAMP_MILLIS_int64 (TIMESTAMP(MILLIS,true));
required int64 _TIMESTAMP_MICROS_int64 (TIMESTAMP(MICROS,true));
required fixed_len_byte_array(12) _INTERVAL_fixed_len_byte_array_12
(INTERVAL);
required int96 _INT96_RAW;
}
Row group 0: count: 3 361.00 B records start: 4 total: 1.058 kB
--------------------------------------------------------------------------------
type encodings count avg size
nulls min / max
rowKey INT32 S _ 3 12.33 B
0 "1" / "3"
_UTF8 BINARY S _ 3 20.67 B
0 "UTF8 string1" / "UTF8 string3"
_Enum BINARY S _ 3 19.67 B
0 "MAX_VALUE" / "RANDOM_VALUE"
_UUID FIXED[16] S _ 3 9.67 B 0
"01010101-0101-0101-0101-0..." / "01010101-0101-0101-0101-0..."
_INT32_RAW INT32 S _ 3 12.33 B
0 "-2147483648" / "2147483647"
_INT_8 INT32 S _ 3 12.33 B
0 "-128" / "127"
_INT_16 INT32 S _ 3 12.33 B
0 "-32768" / "32767"
_INT_32 INT32 S _ 3 12.33 B
0 "-2147483648" / "2147483647"
_UINT_8 INT32 S _ 3 12.33 B
0 "0" / "255"
_UINT_16 INT32 S _ 3 12.33 B
0 "0" / "65535"
_UINT_32 INT32 S _ 3 12.33 B
0 "0" / "4294967295"
_DECIMAL_decimal9 INT32 S _ 3 12.33 B
0 "-0.01" / "12345.67"
_INT64_RAW INT64 S _ 3 16.33 B
0 "-9223372036854775808" / "9223372036854775807"
_INT_64 INT64 S _ 3 16.33 B
0 "-9223372036854775808" / "9223372036854775807"
_UINT_64 INT64 S _ 3 16.33 B
0 "0" / "18446744073709551615"
_DECIMAL_decimal18 INT64 S _ 3 16.33 B
0 "-0.01" / "12345678901234.56"
_DECIMAL_fixed_n FIXED[20] S _ 3 15.33 B 0
"0.00" / "2808600455222908552998455..."
_DECIMAL_unlimited BINARY S _ 3 18.33 B
0 "0.00" / "3395389607300375329868809..."
_DATE_int32 INT32 S _ 3 12.33 B
0 "1969-12-31" / "5350-02-17"
_TIME_MILLIS_int32 INT32 S _ 3 12.33 B
0 "00:00:00.001+0000" / "00:20:34.567+0000"
_TIMESTAMP_MILLIS_int64 INT64 S _ 3 16.33 B
0 "1970-01-01T00:00:00.000+0000" / "2038-01-19T03:14:07.999+0000"
_TIMESTAMP_MICROS_int64 INT64 S _ 3 16.33 B
0 "1970-01-01T00:00:00.00000..." / "+294247-01-10T04:00:54.77..."
_INTERVAL_fixed_len_byte_array_12 FIXED[12] S _ 3 17.67 B 0
_INT96_RAW INT96 S _ R 3 26.00 B
0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]