dclim commented on a change in pull request #6326: Add support hash partitioning by a subset of dimensions to indexTask URL: https://github.com/apache/incubator-druid/pull/6326#discussion_r221719109
########## File path: docs/content/ingestion/native_tasks.md ########## @@ -475,6 +475,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon |maxBytesInMemory|Used in determining when intermediate persists to disk should occur. Normally this is computed internally and user does not need to set it. This value represents number of bytes to aggregate in heap memory before persisting. This is based on a rough estimate of memory usage and not actual usage. The maximum heap memory usage for indexing is maxBytesInMemory * (2 + maxPendingPersists)|1/6 of max JVM memory|no| |maxTotalRows|Total number of rows in segments waiting for being pushed. Used in determining when intermediate pushing should occur.|20000000|no| |numShards|Directly specify the number of shards to create. If this is specified and 'intervals' is specified in the granularitySpec, the index task can skip the determine intervals/partitions pass through the data. numShards cannot be specified if targetPartitionSize is set.|null|no| +|partitionDimensions|The dimensions to partition on. Leave blank to select all dimensions. Only used with numShards > 1, will be ignored when targetPartitionSize or maxTotalRows is set.|null|no| Review comment: Why does this get ignored is targetPartitionSize/maxTotalRows is set? That's also a bit weird since those parameters have non-zero default values if not provided by the user. Wouldn't it get ignored if forceGuaranteedRollup is false? Also agree more documentation on why you would want to use this and how it would allow you to get better data locality would be helpful. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
