lsee9 commented on issue #11544:
URL: https://github.com/apache/druid/issues/11544#issuecomment-893250902
☝️ The above comment is the druid table result.
This is the value after already rolling-up with quantilesDoublesSketch and
becoming ingestion.
The number of rows in the original table is as follows.
query:
```sql
SELECT
country,
SUM("count") AS total_num_rows_original
FROM "mytable"
WHERE __time >= '2021-06-01' AND __time <= '2021-06-01' AND service_code =
'top'
GROUP BY 1
ORDER BY 2 DESC
```
query result:
```json
{"country":"kr","total_num_rows_original":1082227280}
{"country":"us","total_num_rows_original":10978845}
{"country":"jp","total_num_rows_original":2896190}
{"country":"ca","total_num_rows_original":2767109}
{"country":"au","total_num_rows_original":1862148}
{"country":"vn","total_num_rows_original":1718031}
{"country":"nz","total_num_rows_original":575751}
{"country":"de","total_num_rows_original":556492}
{"country":"sg","total_num_rows_original":536305}
{"country":"id","total_num_rows_original":425479}
{"country":"hk","total_num_rows_original":373920}
{"country":"ph","total_num_rows_original":364786}
{"country":"","total_num_rows_original":361175}
{"country":"th","total_num_rows_original":360037}
{"country":"my","total_num_rows_original":333746}
{"country":"gb","total_num_rows_original":324027}
{"country":"mx","total_num_rows_original":240169}
{"country":"ae","total_num_rows_original":237182}
...
{"country":"ad","total_num_rows_original":3}
{"country":"gw","total_num_rows_original":3}
{"country":"so","total_num_rows_original":3}
{"country":"mq","total_num_rows_original":1}
{"country":"sy","total_num_rows_original":1}
```
If total aggregation is performed, the number of original rows is about 810
billion,
up to 20 times the value of N in
table(https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch.html)
But the number of bytes required is about 81 billion rows (2^36 ~ 2^37),
increasing by 1 KB on a log scale.
Based on this calculation, 30KB to 32KB seems to be sufficient!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]