[
https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290583#comment-15290583
]
Stefania commented on CASSANDRA-9530:
-------------------------------------
The patch is good but there is a little bit more work required:
*cassandra.yaml*
* {{Maxium size of the value in SSTable.}} -> {{Maximum size of values in
sstables.}} or {{Maxium size of any value in sstables.}}
* Let's mention in the description of {{native_transport_max_frame_size_in_mb}}
that if this parameter is increased then {{max_value_size_in_mb}} may need
increasing too.
* It's extremely unlikely but just in case someone happens to have some huge
value in some ancient sstable, let's mention a brief description of this
parameter in NEWS.txt.
*Protecting from OOM exceptions*
* I think it should be safe to cap all invocations of
{{AbstractType.readValue()}} and not just the call from {{Cell.deserialize()}}.
This way we also cover clustering values and partition keys.
*SStableCorruptionDetectionTest.java*
* The test is currently
[failing|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.io.sstable/SStableCorruptionDetectionTest/testFinishRandomized_compression/]
on Jenkins because it tries to create the keyspace when it already exists.
* It also failed with an OOM
[here|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.io.sstable/SStableCorruptionDetectionTest/testFinish_compression]
and
[here|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-testall/lastCompletedBuild/testReport/org.apache.cassandra.io.sstable/SStableCorruptionDetectionTest/testFinish/].
* I had a look at the exceptions captured:
** Most of them are in the clustering deserializers. I think we need to extend
the randomization so that we seek to any position in the sstable provided we
don't exceed the corruption size, seeking at position zero is not sufficient.
** Most of the inner exceptions were EOF exceptions, had the table been bigger
they could have been OOM. See the section above on protecting from OOM
exceptions.
* When using randomized tests it is a good idea to print the seed. This way if
a test fails on Jenkins we can reproduce the failure locally as well.
* Nit: at line 144 we can use a {{try}}.
*BlackListingCompactionTest.java*
* We should randomize the values created in {{testBlacklisting()}}.
* As above, we should print the seed.
* To ensure early opening has occurred, we should set
{{conf.sstable_preemptive_open_interval_in_mb}} to 1, then create an sstable of
at least 2 MB and corrupt it after 1.5 MB, is this correct [~slebresne] or is
there more to it?
> SSTable corruption can trigger OOM
> ----------------------------------
>
> Key: CASSANDRA-9530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still
> happily try to allocate a buffer of that bogus size to read the value, which
> can easily lead to an OOM.
> We should probably protect against this. In practice, a given value can be so
> big since it's limited by the protocol frame size in the first place. Maybe
> we could add a max_value_size_in_mb setting and we'd considered a sstable
> corrupted if it was containing a value bigger than that.
> I'll note that this ticket would be a good occasion to improve
> {{BlacklistingCompactionsTest}}. Typically, it currently generate empty
> values which makes it pretty much impossible to get the problem described
> here. And as described in CASSANDRA-9478, it also doesn't test properly for
> thing like early opening of compaction results. We could try to randomize as
> much of the parameters of this test as possible to make it more likely to
> catch any type of corruption that could happen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)