[
https://issues.apache.org/jira/browse/CASSANDRA-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235004#comment-17235004
]
Scott Carey commented on CASSANDRA-16071:
-----------------------------------------
I still have to do a rolling re-index which is not very nice. If it just
interpreted the large values as if it were bytes it would be ok. Sure, log a
loud warning or something. I don't comprehend why interpreting it as bytes
for such values is problematic. Maybe set a floor of some sort if 100k is too
small. But 1GB is useless for anyone who already set this lower than the
default 1GB on purpose.
The default 1GB isn't safe either, due to the bugs I listed in the other
ticket. Large compactions with multiple output files are 1GB _per file_
_output_ per index in the worst case. So a compaction that outputs 40 files
from LCS id DOA on my environment at 1GB – no different than setting it to
1TB. Anyone who set the value smaller than the default did so most likely to
avoid going OOM.
I suppose the patch here will help some people, but is not helpful for me. It
does highlight the issue in the logs which is a big improvement.
To compound issues, the cassandra yum repo does not store older versions, so
rolling back to 3.11.7 is non-trivial.
RE: the upgrade process
In no way is it acceptable in most environments using SASI to drop the old
index and only then build the new one. Most likely, there are queries that
will not function without the index. I have to build a new index with the new
settings (but a different name), then drop the old one.
I also have to carefully build them in the correct order since the query
planner is dependent on the order of creation of the index.
If this interpreted the value as bytes when it is huge, I wouldn't have to
create a new index. If addressing the error log message was as simple as
dividing the value by 2^20 and nothing else, it would probably even be
reasonable to halt the start-up and correct it. But as long as it is an index
rebuild that can take a LONG time on a large table, I think this fix should be
more sensitive to the operational cost incurred – after all this is a minor
patch release and it seems unusual to require data rebuilds in such a patch.
> max_compaction_flush_memory_in_mb is interpreted as bytes
> ---------------------------------------------------------
>
> Key: CASSANDRA-16071
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16071
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/SASI
> Reporter: Michael Semb Wever
> Assignee: Michael Semb Wever
> Priority: Normal
> Fix For: 4.0, 3.11.8, 4.0-beta2, 4.0-beta4, 3.11.10
>
>
> In CASSANDRA-12662, [~scottcarey]
> [reported|https://issues.apache.org/jira/browse/CASSANDRA-12662?focusedCommentId=17070055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17070055]
> that the {{max_compaction_flush_memory_in_mb}} setting gets incorrectly
> interpreted in bytes rather than megabytes as its name implies.
> {quote}
> 1. the setting 'max_compaction_flush_memory_in_mb' is a misnomer, it is
> actually memory in BYTES. If you take it at face value, and set it to say,
> '512' thinking that means 512MB, you will produce a million temp files
> rather quickly in a large compaction, which will exhaust even large values of
> max_map_count rapidly, and get the OOM: Map Error issue above and possibly
> have a very difficult situation to get a cluster back into a place where
> nodes aren't crashing while initilaizing or soon after. This issue is minor
> if you know about it in advance and set the value IN BYTES.
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]