sthetland commented on a change in pull request #12143:
URL: https://github.com/apache/druid/pull/12143#discussion_r784389835
##########
File path: docs/ingestion/ingestion-spec.md
##########
@@ -463,7 +463,7 @@ is:
|-----|-----------|-------|
|type|Each ingestion method has its own tuning type code. You must specify the
type code that matches your ingestion method. Common options are `index`,
`hadoop`, `kafka`, and `kinesis`.||
|maxRowsInMemory|The maximum number of records to store in memory before
persisting to disk. Note that this is the number of rows post-rollup, and so it
may not be equal to the number of input records. Ingested records will be
persisted to disk when either `maxRowsInMemory` or `maxBytesInMemory` are
reached (whichever happens first).|`1000000`|
-|maxBytesInMemory|The maximum aggregate size of records, in bytes, to store in
the JVM heap before persisting. This is based on a rough estimate of memory
usage. Ingested records will be persisted to disk when either `maxRowsInMemory`
or `maxBytesInMemory` are reached (whichever happens first). `maxBytesInMemory`
also includes heap usage of artifacts created from intermediary persists. This
means that after every persist, the amount of `maxBytesInMemory` until next
persist will decreases, and task will fail when the sum of bytes of all
intermediary persisted artifacts exceeds `maxBytesInMemory`.<br /><br />Setting
maxBytesInMemory to -1 disables this check, meaning Druid will rely entirely on
maxRowsInMemory to control memory usage. Setting it to zero means the default
value will be used (one-sixth of JVM heap size).<br /><br />Note that the
estimate of memory usage is designed to be an overestimate, and can be
especially high when using complex ingest-time aggregators, including sk
etches. If this causes your indexing workloads to persist to disk too often,
you can set maxBytesInMemory to -1 and rely on maxRowsInMemory
instead.|One-sixth of max JVM heap size|
+|maxBytesInMemory|The maximum aggregate size of records, in bytes, to store in
the JVM heap before persisting. This is based on a rough estimate of memory
usage. Ingested records will be persisted to disk when either `maxRowsInMemory`
or `maxBytesInMemory` are reached (whichever happens first). `maxBytesInMemory`
also includes heap usage of artifacts created from intermediary persists. This
means that after every persist, the amount of `maxBytesInMemory` until the next
persist will decrease. If the sum of bytes of all intermediary persisted
artifacts exceeds `maxBytesInMemory` the task fails<br /><br />Setting
maxBytesInMemory to -1 disables this check, meaning Druid will rely entirely on
maxRowsInMemory to control memory usage. Setting it to zero means the default
value will be used (one-sixth of JVM heap size).<br /><br />Note that the
estimate of memory usage is designed to be an overestimate, and can be
especially high when using complex ingest-time aggregators, including sketch
es. If this causes your indexing workloads to persist to disk too often, you
can set maxBytesInMemory to -1 and rely on maxRowsInMemory instead.|One-sixth
of max JVM heap size|
Review comment:
Looks better. Just a suggestion for a missing period and format
consistency.
```suggestion
|maxBytesInMemory|The maximum aggregate size of records, in bytes, to store
in the JVM heap before persisting. This is based on a rough estimate of memory
usage. Ingested records will be persisted to disk when either `maxRowsInMemory`
or `maxBytesInMemory` are reached (whichever happens first). `maxBytesInMemory`
also includes heap usage of artifacts created from intermediary persists. This
means that after every persist, the amount of `maxBytesInMemory` until the next
persist will decrease. If the sum of bytes of all intermediary persisted
artifacts exceeds `maxBytesInMemory` the task fails.<br /><br />Setting
`maxBytesInMemory` to -1 disables this check, meaning Druid will rely entirely
on `maxRowsInMemory` to control memory usage. Setting it to zero means the
default value will be used (one-sixth of JVM heap size).<br /><br />Note that
the estimate of memory usage is designed to be an overestimate, and can be
especially high when using complex ingest-time aggregators, including
sketches. If this causes your indexing workloads to persist to disk too
often, you can set `maxBytesInMemory` to -1 and rely on `maxRowsInMemory`
instead.|One-sixth of max JVM heap size|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]