ccaominh commented on a change in pull request #8925: Parallel indexing single 
dim partitions
URL: https://github.com/apache/incubator-druid/pull/8925#discussion_r354664450
 
 

 ##########
 File path: docs/ingestion/native-batch.md
 ##########
 @@ -241,18 +241,37 @@ Currently only one splitHintSpec, i.e., `segments`, is 
available.
 
 ### `partitionsSpec`
 
-PartitionsSpec is to describe the secondary partitioning method.
+PartitionsSpec is used to describe the secondary partitioning method.
 You should use different partitionsSpec depending on the [rollup 
mode](../ingestion/index.md#rollup) you want.
-For perfect rollup, you should use `hashed`.
+For perfect rollup, you should use either `hashed` (partitioning based on the 
hash of dimensions in each row) or
+`single_dim` (based on ranges of a single dimension. For best-effort rollup, 
you should use `dynamic`.
+
+For perfect rollup, `hashed` partitioning is recommended in most cases, as it 
will improve indexing
+performance and create more uniformly sized data segments relative to 
single-dimension partitioning.
+
+#### Hash-based partitioning
 
 |property|description|default|required?|
 |--------|-----------|-------|---------|
 |type|This should always be `hashed`|none|yes|
-|targetRowsPerSegment|Target number of rows to include in a partition, should 
be a number that targets segments of 500MB\~1GB.|5000000 (if `numShards` is not 
set)|either this or `numShards`|
-|numShards|Directly specify the number of shards to create. If this is 
specified and `intervals` is specified in the `granularitySpec`, the index task 
can skip the determine intervals/partitions pass through the data. `numShards` 
cannot be specified if `targetRowsPerSegment` is set.|null|no|
-|partitionDimensions|The dimensions to partition on. Leave blank to select all 
dimensions. Only used with `numShards`, will be ignored when 
`targetRowsPerSegment` is set.|null|no|
+|numShards|Directly specify the number of shards to create. If this is 
specified and `intervals` is specified in the `granularitySpec`, the index task 
can skip the determine intervals/partitions pass through the data. `numShards` 
cannot be specified if `targetRowsPerSegment` is set.|null|yes|
+|partitionDimensions|The dimensions to partition on. Leave blank to select all 
dimensions.|null|no|
 
-For best-effort rollup, you should use `dynamic`.
+#### Single-dimension range partitioning
+
+> Single-dimension range partitioning currently requires the
+> 
[druid-datasketches](../development/extensions-core/datasketches-extension.md)
+> extension to be added to the classpath.
 
 Review comment:
   Added a link to loading the extension from the classpath: 
https://druid.apache.org/docs/latest/development/extensions.html#loading-extensions-from-the-classpath

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to