[ https://issues.apache.org/jira/browse/CASSANDRA-14092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349824#comment-16349824 ]
Paulo Motta commented on CASSANDRA-14092: ----------------------------------------- Thanks for the quick turnaround [~beobal]! See follow-up below: {quote}The wording of the NEWS.txt entry is good, I do wonder if we should maybe place it right at the top of the file rather than just in the 3.0.16 section for extra emphasis. Any thoughts on that? {quote} Good idea, I did this and also updated the text to contemplate the possibility of data loss before this patch and how to fix it with scrub: {noformat} MAXIMUM TTL EXPIRATION DATE NOTICE ----------------------------------- The maximum expiration timestamp that can be represented by the storage engine is 2038-01-19T03:14:06+00:00, which means that inserts with TTL that expire after this date are not currently supported. Prior to 3.0.16 in the 3.0.X series and 3.11.2 in the 3.11 series, there was no protection against INSERTS with TTL expiring after the maximum supported date, causing the expiration time field to overflow and the records to expire immediately. Expired records due to overflow may have been removed permanently after a compaction. The 2.1.X and 2.2.X series are not subject to data loss due to this issue if assertions are enabled, since an AssertionError is thrown during INSERT when the expiration time field overflows on these versions. In practice this issue will affect only users that use very large TTLs, close to the maximum allowed value of 630720000 seconds (20 years), starting from 2018-01-19T03:14:06+00:00. As time progresses, the maximum supported TTL will be gradually reduced as the the maximum expiration date approaches. For instance, a user on an affected version on 2028-01-19T03:14:06 with a TTL of 10 years will be affected by this bug, so we urge users of very large TTLs to upgrade to a version where this issue is addressed as soon as possible. Potentially affected users should inspect their SSTables and search for negative min local deletion times to detect this issue. SSTables in this state must be backed up immediately, as they are subject to data loss during auto-compactions, and may be recovered by running the sstablescrub tool from versions 3.0.16+ and/or 3.11.2+. The Cassandra project plans to fix this limitation in newer versions, but while the fix is not available, operators can decide which policy to apply when dealing with inserts with TTL exceeding the maximum supported expiration date: - REJECT: this is the default policy and will reject any requests with expiration date timestamp after 2038-01-19T03:14:06+00:00. - CAP: any insert with TTL expiring after 2038-01-19T03:14:06+00:00 will expire on 2038-01-19T03:14:06+00:00 and the client will receive a warning. - CAP_NOWARN: same as previous, except that the client warning will not be emitted. These policies may be specified via the -Dcassandra.expiration_date_overflow_policy=POLICY startup option which can be set in the jvm.options file. See CASSANDRA-14092 for more details about this issue. {noformat} Please let me know what do you think of the updated text. We should also probably publish this text (or a subset of it) during the release announcement e-mail. While writing the text above, I figured that there is also a remote possibility of data loss in 2.1/2.2 if assertions are disabled, but didn't backport the scrub recovery since it was not a straightforward backport and I didn't think it was worth the effort right now. We can always do that later if necessary, the most important thing right now is to ship the policies. To reflect this I updated the 4th paragraph on 2.1 and 2.2 to: {noformat} 2.1.X / 2.2.X users in the conditions above should not be subject to data loss unless assertions are disabled, in which case the suspect SSTables must be backed up immediately and manually recovered, as they are subject to data loss during auto-compaction. {noformat} {quote}I also have one piece of feedback on the policies; I don't see any benefit in being able to turn off logging of capped expirations (especially since we're using NoSpamLogger) but I do I think the client warning is useful. {quote} I agree and updated the patch with this suggestion, but at the same time I think advanced operators may want to control the periodicity of the logging, so I created a property {{cassandra.expiration_overflow_warning_interval_minutes=5}} to control this. {quote}I also noticed that the logging of a parse error/invalid value for the policy sysprop is at DEBUG in the current patches, but it might be sensible to draw a bit more attention to that if it happens. {quote} Agreed, changed the logging to WARN. I finished the cleanup of the patch and already provided a version for all branches. The 2.1 and 2.2 versions are pretty much the same, as well as the 3.0/3.11/trunk, except for some minor conflicts. Please find below a short summary of the changes per branch: * 2.1: ** Add REJECT and CAP expiration date overflow policies and tests ** Cap max default TTL at 20 years and tests ** Add NEWS.txt entry * 2.2: ** Same as 2.1, few minor import conflicts * 3.0 ** Add REJECT and CAP, CAP_NOWARN expiration date overflow policies and tests ** Add ability to scrub to fix negative localDeletionTime and tests with broken SSTables ** Add ability to sstablemetadata to show minLocalDeletionTime ** Add expiration date overflow policies to jvm.options file ** Add NEWS.txt entry * 3.11 ** Same as 3.0, few minor conflicts during merge * master ** Same as 3.11, few minor conflicts during merge ** Removed ability of scrub to fix sstables with negative localdeletionTime and tests * dtest ** Test all policies on CQL for default and user supplied TTL ** Test cap policy on thrift for default and user supplied TTL ** Check that offline scrub recovers sstable with negative localDeletionTime I submitted a preliminary round of CI with the non-cleaned up patch and the results looked good. I will submit again for all the branches below and post the results here when they are ready. ||2.1||2.2||3.0||3.11||trunk||dtest|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-14092-v5]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-14092-v5]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-14092-v5]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-14092-v5]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14092-v5]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:14092-v5]| > Max ttl of 20 years will overflow localDeletionTime > --------------------------------------------------- > > Key: CASSANDRA-14092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14092 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Paulo Motta > Assignee: Paulo Motta > Priority: Blocker > Fix For: 2.1.20, 2.2.12, 3.0.16, 3.11.2 > > > CASSANDRA-4771 added a max value of 20 years for ttl to protect against [year > 2038 overflow bug|https://en.wikipedia.org/wiki/Year_2038_problem] for > {{localDeletionTime}}. > It turns out that next year the {{localDeletionTime}} will start overflowing > with the maximum ttl of 20 years ({{System.currentTimeMillis() + ttl(20 > years) > Integer.MAX_VALUE}}), so we should remove this limitation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org