ccaominh commented on a change in pull request #8925: Parallel indexing single dim partitions URL: https://github.com/apache/incubator-druid/pull/8925#discussion_r354664450
########## File path: docs/ingestion/native-batch.md ########## @@ -241,18 +241,37 @@ Currently only one splitHintSpec, i.e., `segments`, is available. ### `partitionsSpec` -PartitionsSpec is to describe the secondary partitioning method. +PartitionsSpec is used to describe the secondary partitioning method. You should use different partitionsSpec depending on the [rollup mode](../ingestion/index.md#rollup) you want. -For perfect rollup, you should use `hashed`. +For perfect rollup, you should use either `hashed` (partitioning based on the hash of dimensions in each row) or +`single_dim` (based on ranges of a single dimension. For best-effort rollup, you should use `dynamic`. + +For perfect rollup, `hashed` partitioning is recommended in most cases, as it will improve indexing +performance and create more uniformly sized data segments relative to single-dimension partitioning. + +#### Hash-based partitioning |property|description|default|required?| |--------|-----------|-------|---------| |type|This should always be `hashed`|none|yes| -|targetRowsPerSegment|Target number of rows to include in a partition, should be a number that targets segments of 500MB\~1GB.|5000000 (if `numShards` is not set)|either this or `numShards`| -|numShards|Directly specify the number of shards to create. If this is specified and `intervals` is specified in the `granularitySpec`, the index task can skip the determine intervals/partitions pass through the data. `numShards` cannot be specified if `targetRowsPerSegment` is set.|null|no| -|partitionDimensions|The dimensions to partition on. Leave blank to select all dimensions. Only used with `numShards`, will be ignored when `targetRowsPerSegment` is set.|null|no| +|numShards|Directly specify the number of shards to create. If this is specified and `intervals` is specified in the `granularitySpec`, the index task can skip the determine intervals/partitions pass through the data. `numShards` cannot be specified if `targetRowsPerSegment` is set.|null|yes| +|partitionDimensions|The dimensions to partition on. Leave blank to select all dimensions.|null|no| -For best-effort rollup, you should use `dynamic`. +#### Single-dimension range partitioning + +> Single-dimension range partitioning currently requires the +> [druid-datasketches](../development/extensions-core/datasketches-extension.md) +> extension to be added to the classpath. Review comment: Added a link to loading the extension from the classpath: https://druid.apache.org/docs/latest/development/extensions.html#loading-extensions-from-the-classpath ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
