[GitHub] jihoonson commented on a change in pull request #6129: Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task

GitBox Wed, 22 Aug 2018 18:34:41 -0700

jihoonson commented on a change in pull request #6129: Add support for 
'maxTotalRows' to incremental publishing kafka indexing task and appenderator 
based realtime task
URL: https://github.com/apache/incubator-druid/pull/6129#discussion_r212118920


 ##########
 File path: docs/content/development/extensions-core/kafka-ingestion.md
 ##########
 @@ -117,7 +117,8 @@ The tuningConfig is optional and default parameters will 
be used if no tuningCon
 |`type`|String|The indexing task type, this should always be `kafka`.|yes|
 |`maxRowsInMemory`|Integer|The number of rows to aggregate before persisting. 
This number is the post-aggregation rows, so it is not equivalent to the number 
of input events, but the number of aggregated rows that those events result in. 
This is used to manage the required JVM heap size. Maximum heap memory usage 
for indexing scales with maxRowsInMemory * (2 + maxPendingPersists). Normally 
user does not need to set this, but depending on the nature of data, if rows 
are short in terms of bytes, user may not want to store a million rows in 
memory and this value should be set.|no (default == 1000000)|
 |`maxBytesInMemory`|Long|The number of bytes to aggregate in heap memory 
before persisting. This is based on a rough estimate of memory usage and not 
actual usage. Normally this is computed internally and user does not need to 
set it. The maximum heap memory usage for indexing is maxBytesInMemory * (2 + 
maxPendingPersists).  |no (default == One-sixth of max JVM memory)|
-|`maxRowsPerSegment`|Integer|The number of rows to aggregate into a segment; 
this number is post-aggregation rows. Handoff will happen either if 
`maxRowsPerSegment` is hit or every `intermediateHandoffPeriod`, whichever 
happens earlier.|no (default == 5000000)|
+|`maxRowsPerSegment`|Integer|The number of rows to aggregate into a segment; 
this number is post-aggregation rows. Handoff will happen either if 
`maxRowsPerSegment` or `maxTotalRows` is hit or every 
`intermediateHandoffPeriod`, whichever happens earlier.|no (default == 5000000)|
+|`maxTotalRows`|Integer|The number of rows to aggregate across all segments; 
this number is post-aggregation rows. Handoff will happen either if 
`maxRowsPerSegment` or `maxTotalRows` is hit or every 
`intermediateHandoffPeriod`, whichever happens earlier.|no (default == 
unlimited)|
 
 Review comment:
   Looks `Long`: 
https://github.com/apache/incubator-druid/pull/6129/files#diff-1b4ea965daf44294fc7cc44870d9df06R65.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] jihoonson commented on a change in pull request #6129: Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task

Reply via email to