Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-26 Thread horschi
If its a 32 bit timestamp, can't we just save/read localDeletionTime as
unsinged int? That would give it another 68 years. I think everyone
involved here could live with that limitation :-)

On Fri, Jan 26, 2018 at 8:16 AM, Anuj Wadehra <
anujw_2...@yahoo.co.in.invalid> wrote:

>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY
> have a shorter life cycle than patching Cassandra in production. But, in
> the interest of the larger Cassandra user community, we should put our best
> effort to avoid breaking all the affected applications in production. We
> should also consider that updating business logic as per the new 15 year
> TTL constraint may have business implications for many users. I have a
> limited understanding about the complexity of the code patch, but it may be
> more feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather
> than asking all impacted users to do an immediate business logic
> adaptation. Moreover, now that we officially support Cassandra 2.1 & 2.2
> until 4.0 release and provide critical fixes for 2.1, it becomes even more
> reasonable to provide this extremely critical patch for 2.1 & 2.2 (unless
> its absolutely impossible). Still, many users use Cassandra 2.1 and 2.2 in
> their most critical production systems.
>
> Thanks
> Anuj
>
> On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa <
> jji...@gmail.com> wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the
> sstable format for old versions (unless whoever writes the patch makes a
> great argument for it), so there’s probably not going to be post-2038 ttl
> support for 2.1/2.2. For those old versions, we can definitely make it not
> lose data, but we almost certainly aren’t going to make the ttl go past
> 2038 in old versions.
>
> More importantly, any company trying to do 20 year ttl’s that’s waiting
> for a patched version should start by patching their app to not write
> invalid ttls - your app release cycle is almost certainly faster than db
> patch / review / test / release / validation, and you can avoid the data
> loss application side by calculating the ttl explicitly. It’s not the best
> solution, but it beats doing nothing, and we’re not rushing out a release
> in less than a day (we haven’t even started a vote, and voting window is 72
> hours for members to review and approve or reject the candidate).
>
>
>
> --
> Jeff Jirsa
>
>
> > On Jan 25, 2018, at 9:07 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> >
> > Patches welcome.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID>
> wrote:
> >>
> >> Hi Paulo,
> >>
> >> Thanks for looking into the issue on priority. I have serious concerns
> regarding reducing the TTL to 15 yrs.The patch will immediately break all
> existing applications in Production which are using 15+ yrs TTL. And then
> they would be stuck again until all the logic in Production software is
> modified and the software is upgraded immediately. This may take days. Such
> heavy downtime is generally not acceptable for any business. Yes, they will
> not have silent data loss but they would not be able to do any business
> either. I think the permanent fix must be prioritized and put on extremely
> fast track. This is a certain Blocker and the impact could be
> enormous--with and without the 15 year short-term patch.
> >>
> >> And believe me --there are plenty such business use cases where you use
> very long TTLs such as 20 yrs for compliance and other reasons.
> >>
> >> Thanks
> >> Anuj
> >>
> >>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman <
> kjell...@apple.com> wrote:
> >>
> >> why are people inserting data with a 15+ year TTL? sorta curious about
> the actual use case for that.
> >>
> >>> On Jan 25, 2018, at 12:36 PM, horschi <hors...@gmail.com> wrote:
> >>>
> >>> The assertion was working fine until yesterday 03:14 UTC.
> >>>
> >>> The long term solution would be to work with a long instead of a int.
> The
> >>> serialized seems to be a variable-int already, so that should be fine
> >>> already.
> >>>
> >>> If you change the assertion to 15 years, then applications might fail,
> as
> >>> they might be setting a 15+ year ttl.
> >>>
> >>> regards,
> >>> Christian
> >>>
> >>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta <pauloricard...@gmail.com
> >
> &

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
Paulo:
Is readUnsignedVInt() limited to 32 bits? I would expect it to be of
variable size. That would mean that the format would be fine. Correct  me
if I'm wong!


Brandon:
Some applications might set the TTL dynamically. Of course the TTL could be
capped and or removed in the application. But it might not be so obvious as
you make it sound.


On Thu, Jan 25, 2018 at 9:49 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> > The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
>
> Agreed but apparently it needs a new sstable format as well as
> mentioned on CASSANDRA-14092.
>
> > If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
>
> This is an emergency measure while we provide a longer term fix. Any
> application using TTL ~= 20 years will need to be lower the TTL anyway
> to prevent data loss.
>
> 2018-01-25 18:40 GMT-02:00 Brandon Williams <dri...@gmail.com>:
> > My guess is they don't know how to NOT set a TTL (perhaps with a default
> in
> > the schema), so they chose max value.  Someone else's problem by then.
> >
> > On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman <kjell...@apple.com>
> > wrote:
> >
> >> why are people inserting data with a 15+ year TTL? sorta curious about
> the
> >> actual use case for that.
> >>
> >> > On Jan 25, 2018, at 12:36 PM, horschi <hors...@gmail.com> wrote:
> >> >
> >> > The assertion was working fine until yesterday 03:14 UTC.
> >> >
> >> > The long term solution would be to work with a long instead of a int.
> The
> >> > serialized seems to be a variable-int already, so that should be fine
> >> > already.
> >> >
> >> > If you change the assertion to 15 years, then applications might
> fail, as
> >> > they might be setting a 15+ year ttl.
> >> >
> >> > regards,
> >> > Christian
> >> >
> >> > On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta <
> pauloricard...@gmail.com>
> >> > wrote:
> >> >
> >> >> Thanks for raising this. Agreed this is bad, when I filed
> >> >> CASSANDRA-14092 I thought a write would fail when localDeletionTime
> >> >> overflows (as it is with 2.1), but that doesn't seem to be the case
> on
> >> >> 3.0+
> >> >>
> >> >> I propose adding the assertion back so writes will fail, and reduce
> >> >> the max TTL to something like 15 years for the time being while we
> >> >> figure a long term solution.
> >> >>
> >> >> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan <
> jeremiah.jor...@gmail.com
> >> >:
> >> >>> If you aren’t getting an error, then I agree, that is very bad.
> >> Looking
> >> >> at the 3.0 code it looks like the assertion checking for overflow was
> >> >> dropped somewhere along the way, I had only been looking into 2.1
> where
> >> you
> >> >> get an assertion error that fails the query.
> >> >>>
> >> >>> -Jeremiah
> >> >>>
> >> >>>> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra <anujw_2...@yahoo.co.in.
> >> INVALID>
> >> >> wrote:
> >> >>>>
> >> >>>>
> >> >>>> Hi Jeremiah,
> >> >>>> Validation is on TTL value not on (system_time+ TTL). You can test
> it
> >> >> with below example. Insert is successful, overflow happens silently
> and
> >> >> data is lost:
> >> >>>> create table test(name text primary key,age int);
> >> >>>> insert into test(name,age) values('test_20yrs',30) USING TTL
> >> 63072;
> >> >>>> select * from test where name='test_20yrs';
> >> >>>>
> >> >>>> name | age
> >> >>>> --+-
> >> >>>>
> >> >>>> (0 rows)
> >> >>>>
> >> >>>> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
> >> >> 630720001;InvalidRequest: Error from server: code=2200 [Invalid
> query]
> >> >> message="ttl is too large. requested (630720001) maximum (63072)"
> >> >>>> ThanksAnuj
> >> >>>>   On Friday 26 January 2018, 12:11:03 AM IST, J. 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
The assertion was working fine until yesterday 03:14 UTC.

The long term solution would be to work with a long instead of a int. The
serialized seems to be a variable-int already, so that should be fine
already.

If you change the assertion to 15 years, then applications might fail, as
they might be setting a 15+ year ttl.

regards,
Christian

On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
wrote:

> Thanks for raising this. Agreed this is bad, when I filed
> CASSANDRA-14092 I thought a write would fail when localDeletionTime
> overflows (as it is with 2.1), but that doesn't seem to be the case on
> 3.0+
>
> I propose adding the assertion back so writes will fail, and reduce
> the max TTL to something like 15 years for the time being while we
> figure a long term solution.
>
> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
> > If you aren’t getting an error, then I agree, that is very bad.  Looking
> at the 3.0 code it looks like the assertion checking for overflow was
> dropped somewhere along the way, I had only been looking into 2.1 where you
> get an assertion error that fails the query.
> >
> > -Jeremiah
> >
> >> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
> wrote:
> >>
> >>
> >> Hi Jeremiah,
> >> Validation is on TTL value not on (system_time+ TTL). You can test it
> with below example. Insert is successful, overflow happens silently and
> data is lost:
> >> create table test(name text primary key,age int);
> >> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
> >> select * from test where name='test_20yrs';
> >>
> >>  name | age
> >> --+-
> >>
> >> (0 rows)
> >>
> >> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
> message="ttl is too large. requested (630720001) maximum (63072)"
> >> ThanksAnuj
> >>On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
> jeremiah.jor...@gmail.com> wrote:
> >>
> >> Where is the dataloss?  Does the INSERT operation return successfully
> to the client in this case?  From reading the linked issues it sounds like
> you get an error client side.
> >>
> >> -Jeremiah
> >>
> >>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> For all those people who use MAX TTL=20 years for inserting/updating
> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
> can silently cause irrecoverable Data Loss. This seems like a certain TOP
> MOST BLOCKER to me. I think the category of the JIRA must be raised to
> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
> one seems to be actively working on it. Just like any other critical
> vulnerability, this vulnerability demands immediate attention from some
> very experienced folks to bring out an Urgent Fast Track Patch for all
> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
> understanding of the JIRA comments, the changes may not be that trivial for
> older releases. So, community support on the patch is very much appreciated.
> >>>
> >>> Thanks
> >>> Anuj
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Low compactionthroughput blocks reads?

2016-03-03 Thread horschi
Thanks. That's good to know.

On Thu, Mar 3, 2016 at 8:48 PM, Benedict Elliott Smith <bened...@apache.org>
wrote:

> Yep, definitely a bug. Introduced by CASSANDRA-9240 (me; mea culpa).
>
> I've filed a JIRA for you: CASSANDRA-11301
>
> On 3 March 2016 at 14:10, horschi <hors...@gmail.com> wrote:
>
> > Hi,
> >
> > I just found another one. Its basically the same, but I'll post it
> anyway:
> >
> > Thread 84311: (state = BLOCKED)
> >  - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may
> be
> > imprecise)
> >  - java.lang.Thread.sleep(long, int) @bci=57, line=340 (Compiled frame)
> >  - java.util.concurrent.TimeUnit.sleep(long) @bci=23, line=386 (Compiled
> > frame)
> >  -
> >
> >
> com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(long,
> > java.util.concurrent.TimeUnit) @bci=22, line=273 (Compiled frame)
> >  -
> >
> >
> com.google.common.util.concurrent.RateLimiter$SleepingTicker$1.sleepMicrosUninterruptibly(long)
> > @bci=10, line=701 (Compiled frame)
> >  - com.google.common.util.concurrent.RateLimiter.acquire(int) @bci=42,
> > line=405 (Compiled frame)
> >  - org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer()
> > @bci=11, line=43 (Compiled frame)
> >  - org.apache.cassandra.io.util.RandomAccessReader.seek(long) @bci=147,
> > line=287 (Compiled frame)
> >  - org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(long)
> > @bci=22, line=65 (Compiled frame)
> >  -
> >
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(long)
> > @bci=5, line=1751 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SimpleSliceReader.(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.composites.Composite) @bci=36, line=57 (Compiled
> > frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.createReader(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.RowIndexEntry,
> > org.apache.cassandra.io.util.FileDataInput,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=38, line=66
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.SSTableSliceIterator.(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=36, line=43
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(org.apache.cassandra.db.DecoratedKey,
> > org.apache.cassandra.db.filter.ColumnSlice[], boolean) @bci=8, line=75
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader,
> > org.apache.cassandra.db.DecoratedKey) @bci=10, line=246 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(org.apache.cassandra.io.sstable.format.SSTableReader)
> > @bci=9, line=62 (Compiled frame)
> >  - org.apache.cassandra.db.CollationController.collectAllData(boolean)
> > @bci=350, line=270 (Compiled frame)
> >  -
> org.apache.cassandra.db.CollationController.getTopLevelColumns(boolean)
> > @bci=39, line=64 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(org.apache.cassandra.db.filter.QueryFilter,
> > int) @bci=40, line=2011 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=141, line=1815 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.Keyspace.getRow(org.apache.cassandra.db.filter.QueryFilter)
> > @bci=11, line=360 (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.db.SliceFromReadCommand.getRow(org.apache.cassandra.db.Keyspace)
> > @bci=222, line=85 (Compiled frame)
> >  -
> > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()
> > @bci=16, line=1587 (Compiled frame)
> >  - org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()
> > @bci=37, line=2232 (Compiled frame)
> >  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> > (Compiled frame)
> >  -
> >
> >
> org.apache.cassandra.concurr

Re: [VOTE CLOSED] Release Apache Cassandra 2.0.10

2014-08-11 Thread horschi
Saw it. Thanks!


On Mon, Aug 11, 2014 at 11:53 PM, Michael Shuler mich...@pbandjelly.org
wrote:

 On 08/11/2014 04:21 PM, Michael Shuler wrote:

 On 08/11/2014 09:50 AM, horschi wrote:

 Would it be possible to have CASSANDRA-7511 reviewed also?


 That was committed to the c*-2.0 branch prior to the vote and was in the
 tentative-2.0.10 CHANGES.txt:

 * Fix truncate to always flush (CASSANDRA-7511)


 I didn't pair up your comment on 7511 and this email - see:

 https://issues.apache.org/jira/browse/CASSANDRA-7750

 --
 Michael