[
https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357234#comment-16357234
]
Paulo Motta edited comment on CASSANDRA-14092 at 2/8/18 5:15 PM:
-----------------------------------------------------------------
Thanks for the review, this took a bit longer than expected as I found an edge
case which required a change from the previous approach as I describe later.
First, see the follow-up from the previous round of review:
{quote}In the 3.0+ branches,
ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use
Cell.NO_TTL rather than 0 in the first check.
{quote}
Agreed, fixed
[here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c].
{quote}In 3.0+ you renamed the static policy field to have a shorter name, but
missed that in the 2.1 & 2.2 branches.
{quote}
Good catch, fixed
[here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce].
{quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I
instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to
sysout and eventually managed to repro it. You can see from the output that in
the first line, the min and actual are actually > the max, which caused the
test to fail (this happened around 10% of the time).
{quote}
Oh, that's funny! Perhaps it could be related to the platform, as I cannot
reproduce this locally? In any case, I updated the check
[here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13]
to be more deterministic.
I also noticed that we weren't applying the expiration overflow policy in the
CQLv2 interface, so I updated it
[here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1].
I find it hard someone is using this but I wouldn't be totally surprised.
{quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not
yet moved to 3.x and they should probably get the chance to repair (maybe)
their data if they want/are able to.
{quote}
Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have
assertions disabled, but given that we have this
[comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173]
on 2.1 I wouldn't be surprised if some users with assertions disabled hit this.
While writing the 2.1 scrubber (which is slightly different from 3.0 due to
CASSANDRA-8099), I found a nasty edge case with the recovery process. The
problem is that, if the cell with negative/overflowed {{localExpirationTime}}
[is converted to a tombstone during
compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95],
we can't convert the tombstone back into an expired cell because the TTL value
is lost and we can't differentiate this from an ordinary tombstone.
Furthermore, even if the original SSTable is scrubbed and the negative
{{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone
was already generated, it will shadow/delete the fixed expired cell, because
[deleted cells have precedence over live cells when the timestamp is the
same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43].
I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an
SSTable with the expired cell converted to a
tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92],
and verified that an existing tombstone shadows recovered data in the current
version.
In order to fix this, the idea is during scrub increment the timestamp of cells
with negative {{localExpirationTime}} by one in addition to setting the
{{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will
shadow/supersede the potential tombstone that was already written.
Since this will not recreate the row, but technically reinsert the row with a
higher timestamp, we cannot do this automatically on scrub, since it may
overwrite an existing row inserted just one millisecond after the original row
(while very hard to happen in practice, this may depend on application logic).
For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}}
option during scrub to perform the recovery.
This approach is implemented
[here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b]
on 3.0 and ported to all other branches in a separate commit.
{quote}The last thing is about providing a route to fix up overflowed dates via
scrub, I think we should definitely leave the remedial Scrubber code in trunk
until we have a proper fix committed.
{quote}
Since 4.0 was not released, and it will come out with this fix it's impossible
that someone will hit this on 4.0, but I don't see any problem in keeping the
scrub option for the lazy ones upgrading from 3.x without fixing their SSTables
beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests.
To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I
removed [this
assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3],
since it's no longer necessary given we apply the overflow policy.
Finally, I
[refactored|https://github.com/apache/cassandra/commit/c3e1fcc25b3db9db212dfc805a47430ff537b101]
the NEWS.txt section a bit to take [~KurtG] comments into account and also to
give more emphasis to the expiration overflow policies since this will be what
most users need to care about, and added a recovery subsection with detailed
recovery instructions.
I now wonder if this notice has gotten to big and is cluttering the NEWS.txt
file too much - what may confuse users looking for upgrade instructions - and
we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in
the NEWS.txt file instead?
To facilitate review, I squashed the previous commits and created a new branch
with new commits only representing changes since last review (except for the
2.2 patch, which is basically the same as 2.1 except the additional commit
simplifying TTLTest). The previous CI looked good, I will submit a new CI run
and post the results here later.
||2.1||2.2||3.0||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]|
was (Author: pauloricardomg):
Thanks for the review, this took a bit longer than expected as I found an edge
case which required a change from the previous approach as I describe later.
First, see the follow-up from the previous round of review:
{quote}In the 3.0+ branches,
ExpirationDateOverflowHandling::maybeApplyExpirationDateOverflowPolicy can use
Cell.NO_TTL rather than 0 in the first check.
{quote}
Agreed, fixed
[here|https://github.com/apache/cassandra/commit/4bcea858f62b619b83a5db83f0cd93e192fea80c].
{quote}In 3.0+ you renamed the static policy field to have a shorter name, but
missed that in the 2.1 & 2.2 branches.
{quote}
Good catch, fixed
[here|https://github.com/apache/cassandra/commit/df59f2de8042e7ea67e520d509d675da1d53bbce].
{quote}In 2.1 I saw some (very) intermittent test failures in TTLTest. I
instrumented checkTTLIsCapped to print out the (min | actual | max) TTLs to
sysout and eventually managed to repro it. You can see from the output that in
the first line, the min and actual are actually > the max, which caused the
test to fail (this happened around 10% of the time).
{quote}
Oh, that's funny! Perhaps it could be related to the platform, as I cannot
reproduce this locally? In any case, I updated the check
[here|https://github.com/apache/cassandra/commit/18be7f140c3c2159f69d50b1bfb068927c734a13]
to be more deterministic.
I also noticed that we weren't applying the expiration overflow policy in the
CQLv2 interface, so I updated it
[here|https://github.com/apache/cassandra/commit/4230a5b4126e972827990dc33c1a9140af07afe1].
I find it hard someone is using this but I wouldn't be totally surprised.
{quote}I think we should have the scrub fix in 2.1 & 2.2 as some users have not
yet moved to 3.x and they should probably get the chance to repair (maybe)
their data if they want/are able to.
{quote}
Agreed, as noted in the NEWS.txt, 2.1/2.2 is only affected if users have
assertions disabled, but given that we have this
[comment|https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra-env.sh#L173]
on 2.1 I wouldn't be surprised if some users with assertions disabled hit this.
While writing the 2.1 scrubber (which is slightly different from 3.0 due to
CASSANDRA-8099), I found a nasty edge case with the recovery process. The
problem is that, if the cell with negative/overflowed {{localExpirationTime}}
[is converted to a tombstone during
compaction|https://github.com/apache/cassandra/blob/30ed83d9266a03debad98ffac5610dcb3ae30934/src/java/org/apache/cassandra/db/rows/AbstractCell.java#L95],
we can't convert the tombstone back into an expired cell because the TTL value
is lost and we can't differentiate this from an ordinary tombstone.
Furthermore, even if the original SSTable is scrubbed and the negative
{{localExpirationTime}} is converted to {{MAX_DELETION_TIME}}, if the tombstone
was already generated, it will shadow/delete the fixed expired cell, because
[deleted cells have precedence over live cells when the timestamp is the
same|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/db/Conflicts.java#L43].
I updated {{recover_negative_expiration_date_sstables_with_scrub}} to [add an
SSTable with the expired cell converted to a
tombstone|https://github.com/apache/cassandra-dtest/commit/63b0e7d51fbacf4fabbb860d81873605c3b66c92],
and verified that an existing tombstone shadows recovered data in the current
version.
In order to fix this, the idea is during scrub increment the timestamp of cells
with negative {{localExpirationTime}} by one in addition to setting the
{{localExpirationTime}} to {{MAX_DELETION_TIME}}, so the recovered cell will
shadow/supersede the potential tombstone that was already written.
Since this will not recreate the row, but technically reinsert the row with a
higher timestamp, we cannot do this automatically on scrub, since it may
overwrite an existing row inserted just one millisecond after the original row
(while very hard to happen in practice, this may depend on application logic).
For this reason, the user needs to specify the {{--reinsert-overflowed-ttl}}
option during scrub to perform the recovery.
This approach is implemented
[here|https://github.com/apache/cassandra/commit/5f9f704449ed1ef579c61953b02a9ef32f13082b]
on 3.0 and ported to all other branches in a separate commit.
{quote}The last thing is about providing a route to fix up overflowed dates via
scrub, I think we should definitely leave the remedial Scrubber code in trunk
until we have a proper fix committed.
{quote}
Since 4.0 was not released, and it will come out with this fix it's impossible
that someone will hit this on 4.0, but I don't see any problem in keeping the
scrub option for the lazy ones upgrading from 3.x without fixing their SSTables
beforehand. :) For this reason I kept the 3.11 SSTables in the 4.X (d)tests.
To prevent scrub and the tests from throwing an {{AssertionError}} on 2.1 I
removed [this
assertion|https://github.com/apache/cassandra/commit/4501eee5c962547c059a4f624155c5e18d5369d3],
since it's no longer necessary given we apply the overflow policy.
Finally, I refactored the NEWS.txt section a bit to take [~KurtG] comments into
account and also to give more emphasis to the expiration overflow policies
since this will be what most users need to care about, and added a recovery
subsection with detailed recovery instructions.
I now wonder if this notice has gotten to big and is cluttering the NEWS.txt
file too much - what may confuse users looking for upgrade instructions - and
we should maybe create a new WARNING/IMPORTANT file and add a pointer to it in
the NEWS.txt file instead?
To facilitate review, I squashed the previous commits and created a new branch
with new commits only representing changes since last review (except for the
2.2 patch, which is basically the same as 2.1 except the additional commit
simplifying TTLTest). The previous CI looked good, I will submit a new CI run
and post the results here later.
||2.1||2.2||3.0||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v6]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v6]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v6]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092]|
> Max ttl of 20 years will overflow localDeletionTime
> ---------------------------------------------------
>
> Key: CASSANDRA-14092
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14092
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Blocker
> Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2
>
>
> CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year
> 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for
> {{localDeletionTime}}.
> It turns out that next year the {{localDeletionTime}} will start overflowing
> with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20
> years) > Integer.MAX_VALUE}}), so we should remove this limitation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]