zeropc commented on issue #6009:
URL:
https://github.com/apache/incubator-doris/issues/6009#issuecomment-864511062
> There are two problems here
> 1 DppUtils.getHashValue() didn't deal with date type, so no bytes is added
here, so result is wrong;
> 2 I create a table with Integer type and using `Spark Load` to load data,
but I didn't reproduce it; The case is as below;
>
> ```
> table:
> CREATE TABLE `test_int_bucket` (
> `tinyint_col` tinyint(4) NULL COMMENT "",
> `smallint_col` smallint(6) NULL COMMENT "",
> `int_col` int(11) NULL COMMENT "",
> `bigint_col` bigint(20) NULL COMMENT "",
> `pv_sum` int(11) SUM NULL COMMENT ""
> ) ENGINE=OLAP
> AGGREGATE KEY(`tinyint_col`, `smallint_col`, `int_col`, `bigint_col`)
> COMMENT "OLAP"
> DISTRIBUTED BY HASH(`tinyint_col`,`smallint_col`,`int_col`,`bigint_col`)
BUCKETS 3
> PROPERTIES (
> "replication_num" = "1",
> "in_memory" = "false",
> "storage_format" = "DEFAULT"
> );
>
> data:
> mysql> select * from test_int_bucket;
> +-------------+--------------+---------+------------+--------+
> | tinyint_col | smallint_col | int_col | bigint_col | pv_sum |
> +-------------+--------------+---------+------------+--------+
> | 1 | 1 | 1 | 1 | 1 |
> | 4 | 4 | 4 | 4 | 4 |
> | 2 | 2 | 2 | 2 | 2 |
> | 3 | 3 | 3 | 3 | 3 |
> +-------------+--------------+---------+------------+--------+
> 4 rows in set (0.01 sec)
>
>
> query:
> mysql> select count(1) from test_int_bucket where bigint_col=1;
> +----------+
> | count(1) |
> +----------+
> | 1 |
> +----------+
> 1 row in set (0.02 sec)
> ```
sorry for the wrong case. In the procedure you provided, the distributed key
should be only one column like below:
CREATE TABLE `test_int_bucket` (
`int_col` int(11) NULL COMMENT "",
`pv_sum` int(11) SUM NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`int_col`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`int_col`) BUCKETS 10
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
It is recommended to increase the bucket number so that it would be easier
to to reproduce it.
Another option is to change the query to below:
select count(1) from test_int_bucket where tinyint_col=1 and smallint_col=1
and int_col=1 and bigint_col=1;
The key is to make the query plan scan ONLY ONE BUCKET that contains the
target data. The problem here is that the query goes to the wrong tablet, not
that the data itself (i.e. if scanning all tablet, the result wont go wrong).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]