[
https://issues.apache.org/jira/browse/KUDU-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-2258:
------------------------------
Component/s: test
> Create timeseries workload integration test
> -------------------------------------------
>
> Key: KUDU-2258
> URL: https://issues.apache.org/jira/browse/KUDU-2258
> Project: Kudu
> Issue Type: Test
> Components: test
> Reporter: Dan Burkert
> Priority: Major
>
> A common usecase for Kudu is storing timeseries data sets. Right now we
> don't have a good integration test simulating these workloads. Ideally such
> an integration test would serve as a good starting point for investigating
> and reproducing performance issues with timeseries workloads.
> The timeseries workloads we've seen usually have these characteristics:
> - Hash partitioning over 1 or 2 series id columns, which are often a UUID or
> similar pseudo-random ID.
> - Very high cardinality over the ID column(s), in the ballpark of tens or
> hundreds of millions
> - Range partitioning over a timestamp column, although it may be sufficient
> to only simulate a single time range for an integration test.
> - The test should probably be flexible with the data column types and count,
> there is no 'common' case here.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)