Re: Roadmap for 4.0

2018-04-10 Thread Jeff Jirsa


-- 
Jeff Jirsa


On Apr 10, 2018, at 5:24 PM, Josh McKenzie  wrote:

>> 
>> 50'ish days is too short to draw a line in the sand,
>> especially as people balance work obligations with Cassandra feature
>> development.
> 
> What's a reasonable alternative / compromise for this? And what
> non-disruptive-but-still-large patches are in flight that we would want to
> delay the line in the sand for?

I don’t care about non disruptive patches to be really honest. Nobody’s running 
trunk now, so it doesn’t matter to me if the patch landed 6 months ago or Jun 
29, unless you can show me one person who’s ran a nontrivial multi-dc test 
cluster under real load that included correctness validation. Short of that, 
it’s untested, and the duration a patch has been in an untested repo is 
entirely irrelevant.

If there’s really someone already testing trunk in a meaningful way (real 
workloads, and verifying correctness), and that person is really able to find 
and fix bugs, then tell me who it is and I’ll change my opinion (and  I’m not 
even talking about thousand node clusters, just someone who’s actually using 
real data, like something upgraded from 2.1/3.0, and is checking to prove it 
matches expectations). 

Otherwise, when the time comes for real users to plan real upgrades to a 
hypothetical 4.1, they’ll have to do two sets of real, expensive, annoying 
testing - one for the stuff in 4.0 (chunk cache, file format changes, internode 
changes, etc), and a second for 4.0-4.1 changes for the invasive stuff I care 
about and you don’t want to wait for.

I’d rather see us get all this stuff in and then spend real time testing and 
fixing in a 4-6 month alpha/beta phase (where real users can help, because its 
one real dedicated validation phase) than push this into two (probably 
inadequately tested) releases.

But that’s just my opinion, and I’ll support it with my one vote, and I may get 
outvoted, but that’s what I’d rather see happen.


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-10 Thread Sankalp Kohli
Also in this time we should try to see who can do 3 things I mentioned in my 
earlier email 

> On Apr 10, 2018, at 17:50, Sankalp Kohli  wrote:
> 
> I think moving it to August/Sept will be better 
> 
> On Apr 10, 2018, at 17:24, Josh McKenzie  wrote:
> 
>>> 
>>> 50'ish days is too short to draw a line in the sand,
>>> especially as people balance work obligations with Cassandra feature
>>> development.
>> 
>> What's a reasonable alternative / compromise for this? And what
>> non-disruptive-but-still-large patches are in flight that we would want to
>> delay the line in the sand for?
>> 
>>> On Tue, Apr 10, 2018 at 6:34 PM, Jeff Jirsa  wrote:
>>> 
>>> Seriously, what's the rush to branch? Do we all love merging so much we
>>> want to do a few more times just for the sake of merging? If nothing
>>> diverges, there's nothing gained from the branch, and if it did diverge, we
>>> add work for no real gain.
>>> 
>>> Beyond that, I still don't like June 1. Validating releases is hard. It
>>> sounds easy to drop a 4.1 and ask people to validate again, but it's a hell
>>> of a lot harder than it sounds. I'm not saying I'm a hard -1, but I really
>>> think it's too soon. 50'ish days is too short to draw a line in the sand,
>>> especially as people balance work obligations with Cassandra feature
>>> development.
>>> 
>>> 
>>> 
>>> 
 On Tue, Apr 10, 2018 at 3:18 PM, Nate McCall  wrote:
 
 A lot of good points and everyone's input is really appreciated.
 
 So it sounds like we are building consensus towards June 1 for 4.0
 branch point/feature freeze and the goal is stability. (No one has
 come with a hard NO anyway).
 
 I want to reiterate Sylvain's point that we can do whatever we want in
 terms of dropping a new feature 4.1/5.0 (or whatev.) whenever we want.
 
 In thinking about this, what is stopping us from branching 4.0 a lot
 sooner? Like now-ish? This will let folks start hacking on trunk with
 new stuff, and things we've gotten close on can still go in 4.0
 (Virtual tables). I guess I'm asking here if we want to disambiguate
 "feature freeze" from "branch point?" I feel like this makes sense.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
>>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-10 Thread Sankalp Kohli
I think moving it to August/Sept will be better 

On Apr 10, 2018, at 17:24, Josh McKenzie  wrote:

>> 
>> 50'ish days is too short to draw a line in the sand,
>> especially as people balance work obligations with Cassandra feature
>> development.
> 
> What's a reasonable alternative / compromise for this? And what
> non-disruptive-but-still-large patches are in flight that we would want to
> delay the line in the sand for?
> 
>> On Tue, Apr 10, 2018 at 6:34 PM, Jeff Jirsa  wrote:
>> 
>> Seriously, what's the rush to branch? Do we all love merging so much we
>> want to do a few more times just for the sake of merging? If nothing
>> diverges, there's nothing gained from the branch, and if it did diverge, we
>> add work for no real gain.
>> 
>> Beyond that, I still don't like June 1. Validating releases is hard. It
>> sounds easy to drop a 4.1 and ask people to validate again, but it's a hell
>> of a lot harder than it sounds. I'm not saying I'm a hard -1, but I really
>> think it's too soon. 50'ish days is too short to draw a line in the sand,
>> especially as people balance work obligations with Cassandra feature
>> development.
>> 
>> 
>> 
>> 
>>> On Tue, Apr 10, 2018 at 3:18 PM, Nate McCall  wrote:
>>> 
>>> A lot of good points and everyone's input is really appreciated.
>>> 
>>> So it sounds like we are building consensus towards June 1 for 4.0
>>> branch point/feature freeze and the goal is stability. (No one has
>>> come with a hard NO anyway).
>>> 
>>> I want to reiterate Sylvain's point that we can do whatever we want in
>>> terms of dropping a new feature 4.1/5.0 (or whatev.) whenever we want.
>>> 
>>> In thinking about this, what is stopping us from branching 4.0 a lot
>>> sooner? Like now-ish? This will let folks start hacking on trunk with
>>> new stuff, and things we've gotten close on can still go in 4.0
>>> (Virtual tables). I guess I'm asking here if we want to disambiguate
>>> "feature freeze" from "branch point?" I feel like this makes sense.
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Repair scheduling tools

2018-04-10 Thread Elliott Sims
My two cents as a (relatively small) user.  I'm coming at this from the
ops/user side, so my apologies if some of these don't make sense based on a
more detailed understanding of the codebase:

Repair is definitely a major missing piece of Cassandra.  Integrated would
be easier, but a sidecar might be more flexible.  As an intermediate step
that works towards both options, does it make sense to start with
finer-grained tracking and reporting for subrange repairs?  That is, expose
a set of interfaces (both internally and via JMX) that give a scheduler
enough information to run subrange repairs across multiple keyspaces or
even non-overlapping ranges at the same time.  That lets people experiment
with and quickly/safely/easily iterate on different scheduling strategies
in the short term, and long-term those strategies can be integrated into a
built-in scheduler

On the subject of scheduling, I think adjusting parallelism/aggression with
a possible whitelist or blacklist would be a lot more useful than a "time
between repairs".  That is, if repairs run for a few hours then don't run
for a few (somewhat hard-to-predict) hours, I still have to size the
cluster for the load when the repairs are running.   The only reason I can
think of for an interval between repairs is to allow re-compaction from
repair anticompactions, and subrange repairs seem to eliminate this.  Even
if they didn't, a more direct method along the lines of "don't repair when
the compaction queue is too long" might make more sense.  Blacklisted
timeslots might be useful for avoiding peak time or batch jobs, but only if
they can be specified for consistent time-of-day intervals instead of
unpredictable lulls between repairs.

I really like the idea of automatically adjusting gc_grace_seconds based on
repair state.  The only_purge_repaired_tombstones option fixes this
elegantly for sequential/incremental repairs on STCS, but not for subrange
repairs or LCS (unless a scheduler gains the ability somehow to determine
that every subrange in an sstable has been repaired and mark it
accordingly?)


On 2018/04/03 17:48:14, Blake Eggleston  wrote:
> Hi dev@,>
>
>  >
>
> The question of the best way to schedule repairs came up on
CASSANDRA-14346, and I thought it would be good to bring up the idea of an
external tool on the dev list.>
>
>  >
>
> Cassandra lacks any sort of tools for automating routine tasks that are
required for running clusters, specifically repair. Regular repair is a
must for most clusters, like compaction. This means that, especially as far
as eventual consistency is concerned, Cassandra isn’t totally functional
out of the box. Operators either need to find a 3rd party solution or
implement one themselves. Adding this to Cassandra would make it easier to
use.>
>
>  >
>
> Is this something we should be doing? If so, what should it look like?>
>
>  >
>
> Personally, I feel like this is a pretty big gap in the project and would
like to see an out of process tool offered. Ideally, Cassandra would just
take care of itself, but writing a distributed repair scheduler that you
trust to run in production is a lot harder than writing a single process
management application that can failover.>
>
>  >
>
> Any thoughts on this?>
>
>  >
>
> Thanks,>
>
>  >
>
> Blake>
>
>


Re: Roadmap for 4.0

2018-04-10 Thread Jeff Jirsa
Seriously, what's the rush to branch? Do we all love merging so much we
want to do a few more times just for the sake of merging? If nothing
diverges, there's nothing gained from the branch, and if it did diverge, we
add work for no real gain.

Beyond that, I still don't like June 1. Validating releases is hard. It
sounds easy to drop a 4.1 and ask people to validate again, but it's a hell
of a lot harder than it sounds. I'm not saying I'm a hard -1, but I really
think it's too soon. 50'ish days is too short to draw a line in the sand,
especially as people balance work obligations with Cassandra feature
development.




On Tue, Apr 10, 2018 at 3:18 PM, Nate McCall  wrote:

> A lot of good points and everyone's input is really appreciated.
>
> So it sounds like we are building consensus towards June 1 for 4.0
> branch point/feature freeze and the goal is stability. (No one has
> come with a hard NO anyway).
>
> I want to reiterate Sylvain's point that we can do whatever we want in
> terms of dropping a new feature 4.1/5.0 (or whatev.) whenever we want.
>
> In thinking about this, what is stopping us from branching 4.0 a lot
> sooner? Like now-ish? This will let folks start hacking on trunk with
> new stuff, and things we've gotten close on can still go in 4.0
> (Virtual tables). I guess I'm asking here if we want to disambiguate
> "feature freeze" from "branch point?" I feel like this makes sense.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-10 Thread Nate McCall
A lot of good points and everyone's input is really appreciated.

So it sounds like we are building consensus towards June 1 for 4.0
branch point/feature freeze and the goal is stability. (No one has
come with a hard NO anyway).

I want to reiterate Sylvain's point that we can do whatever we want in
terms of dropping a new feature 4.1/5.0 (or whatev.) whenever we want.

In thinking about this, what is stopping us from branching 4.0 a lot
sooner? Like now-ish? This will let folks start hacking on trunk with
new stuff, and things we've gotten close on can still go in 4.0
(Virtual tables). I guess I'm asking here if we want to disambiguate
"feature freeze" from "branch point?" I feel like this makes sense.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Failed request after changing compression strategy from LZ4 to Deflate and using upgradesstable

2018-04-10 Thread Nate McCall
Hi Hitesh,
This list is for conversations regarding Cassandra development. Can
you subscribe and post this to us...@cassandra.apache.org instead? You
will get a much wider audience when you do.



On Tue, Apr 10, 2018 at 10:45 PM, hitesh dua  wrote:
>  Hi,
> My Compression strategy in Production was *LZ4 Compression. *But I modified
> it to *Deflate *using alter command
>
> For compression change, we have to use *nodetool Upgradesstables *to
> forcefully upgrade the compression strategy on all sstables
>
> But once upgradesstabloes command completed on all the 5 nodes in the
> cluster, My requests started to fail, both read and write
>
> Replication Factor - 3
> Read Consistency - 1
> Write Consistency - 1
> FYI - I am also using lightweight transaction which uses PAXOS
> Cassandra Version 3.10
>
> I am now facing Following Errors in my debug.log file and some of my
> requests have started to fail :
>
> Debug.log
>
> ERROR [ReadRepairStage:82952] 2018-04-09 19:05:20,669
>>> CassandraDaemon.java:229 - Exception in thread
>>> Thread[ReadRepairStage:82952,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>>> - received only 0 responses.
>>
>> at 
>> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 
>> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> ~[na:1.8.0_144]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> ~[na:1.8.0_144]
>>
>> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>>
>> DEBUG [ReadRepairStage:82953] 2018-04-09 19:05:22,932
>>> ReadCallback.java:242 - Digest mismatch:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>> DecoratedKey(-2666936192316364820, 5756f5b8e7b341afa22cef22c5d33260)
>>> (d29a0e2a05f81315f0945dee5a210060 vs d41d8cd98f00b204e9800998ecf8427e)
>>
>> at 
>> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> [na:1.8.0_144]
>>
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> [na:1.8.0_144]
>>
>> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>>> [apache-cassandra-3.10.jar:3.10]
>>
>> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>>
>> INFO  [HintsDispatcher:767] 2018-04-09 19:05:24,874
>>> HintsDispatchExecutor.java:283 - Finished hinted handoff of file
>>> 68c7c130-6cf8-4864-bde8-1819f238045c-1523315072851-1.hints to endpoint
>>> 68c7c130-6cf8-4864-bde8-1819f238045c, partially
>>
>> DEBUG [ReadRepairStage:82950] 2018-04-09 19:05:24,932
>>> DataResolver.java:169 - Timeout while read-repairing after receiving all 1
>>> data and digest responses
>>
>> ERROR [ReadRepairStage:82950] 2018-04-09 19:05:24,933
>>> CassandraDaemon.java:229 - Exception in thread
>>> Thread[ReadRepairStage:82950,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>>> - received only 0 responses.
>>
>> at 
>> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 
>> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>> ~[apache-cassandra-3.10.jar:3.10]
>>
>> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>>
>> at 

Re: Roadmap for 4.0

2018-04-10 Thread sankalp kohli
Hi,
I am +1 on freezing features at some point.

Here are my thoughts
1. The reason it took 1.5 years b/w 3.0 and 4.0 is because 3.0 was
released(not cut) too early. There were so many critical bugs in it for
months after the release. Most people have just finished or about to
upgrade to 3.0. (Please correct me if my understanding is wrong)
2. We should cut(not release) the branch when some of it is true. I am not
sure which ones are must in this list and we should discuss.
 a. Huge change log(This is true). The change log is also not growing very
quickly which is bad for project but beneficial for this.
 b. Which people are willing to start testing the next day it is cut.
 c. Do we have resources to fix the critical bugs. What if we find bugs and
no one is available to fix/review them. Can someone sign up for this.
 d. Do we have resources to fix all Dtest including upgrade tests.


Thanks,
Sankalp

On Tue, Apr 10, 2018 at 9:55 AM, Eric Evans 
wrote:

> On Mon, Apr 9, 2018 at 3:56 PM, Jonathan Haddad  wrote:
>
> [ ... ]
>
> > If they're not close to finished now why even consider them for
> > the 4.0 release?  They're so core they should be merged into trunk at the
> > beginning of the cycle for the follow up release in order to get as much
> > exposure as possible.
>
> This sounds right to me.  Bigger, destabilizing changes should land at
> the beginning of the cycle; Setting up a mad rush at the end of a
> release cycle does not yield favorable results (we've done this, we
> know).
>
> > On Mon, Apr 9, 2018 at 1:46 PM Nate McCall  wrote:
> >
> >> > I'd like to see pluggable storage and transient replica tickets land,
> for
> >> > starters.
> >>
> >> I think both those features are, frankly, necessary for our future. On
> >> the other hand, they both have the following risks:
> >> 1. core behavioral changes
> >> 2. require changing a (relatively) large surface area of code
> >>
> >> We can aim to de-risk 4.0 by focusing on what we have now which is
> >> solid repair and NIO internode (maybe we move the 4.0 branch timeline
> >> up?), aiming for a 4.1 following soon-ish.
> >>
> >> Or we can go in eyes open and agree on a larger footprint 4.0.
> >>
> >> I'm on the fence, tbh (can't emphasize enough how big both those
> >> features will be). I just want everyone to know what we are getting
> >> into and that we are potentially impacting our goals of "stable" ==
> >> "exciting."
>
> Unfortunately, when stability suffers things get "exciting" for all
> sorts of unintended reasons.  I'm personally not umm, excited, by that
> prospect.
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Roadmap for 4.0

2018-04-10 Thread Eric Evans
On Mon, Apr 9, 2018 at 3:56 PM, Jonathan Haddad  wrote:

[ ... ]

> If they're not close to finished now why even consider them for
> the 4.0 release?  They're so core they should be merged into trunk at the
> beginning of the cycle for the follow up release in order to get as much
> exposure as possible.

This sounds right to me.  Bigger, destabilizing changes should land at
the beginning of the cycle; Setting up a mad rush at the end of a
release cycle does not yield favorable results (we've done this, we
know).

> On Mon, Apr 9, 2018 at 1:46 PM Nate McCall  wrote:
>
>> > I'd like to see pluggable storage and transient replica tickets land, for
>> > starters.
>>
>> I think both those features are, frankly, necessary for our future. On
>> the other hand, they both have the following risks:
>> 1. core behavioral changes
>> 2. require changing a (relatively) large surface area of code
>>
>> We can aim to de-risk 4.0 by focusing on what we have now which is
>> solid repair and NIO internode (maybe we move the 4.0 branch timeline
>> up?), aiming for a 4.1 following soon-ish.
>>
>> Or we can go in eyes open and agree on a larger footprint 4.0.
>>
>> I'm on the fence, tbh (can't emphasize enough how big both those
>> features will be). I just want everyone to know what we are getting
>> into and that we are potentially impacting our goals of "stable" ==
>> "exciting."

Unfortunately, when stability suffers things get "exciting" for all
sorts of unintended reasons.  I'm personally not umm, excited, by that
prospect.


-- 
Eric Evans
john.eric.ev...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Failed request after changing compression strategy from LZ4 to Deflate and using upgradesstable

2018-04-10 Thread hitesh dua
 Hi,
My Compression strategy in Production was *LZ4 Compression. *But I modified
it to *Deflate *using alter command

For compression change, we have to use *nodetool Upgradesstables *to
forcefully upgrade the compression strategy on all sstables

But once upgradesstabloes command completed on all the 5 nodes in the
cluster, My requests started to fail, both read and write

Replication Factor - 3
Read Consistency - 1
Write Consistency - 1
FYI - I am also using lightweight transaction which uses PAXOS
Cassandra Version 3.10

I am now facing Following Errors in my debug.log file and some of my
requests have started to fail :

Debug.log

ERROR [ReadRepairStage:82952] 2018-04-09 19:05:20,669
>> CassandraDaemon.java:229 - Exception in thread
>> Thread[ReadRepairStage:82952,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>> - received only 0 responses.
>
> at 
> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_144]
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>
> DEBUG [ReadRepairStage:82953] 2018-04-09 19:05:22,932
>> ReadCallback.java:242 - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(-2666936192316364820, 5756f5b8e7b341afa22cef22c5d33260)
>> (d29a0e2a05f81315f0945dee5a210060 vs d41d8cd98f00b204e9800998ecf8427e)
>
> at 
> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [na:1.8.0_144]
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>> [apache-cassandra-3.10.jar:3.10]
>
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
>
> INFO  [HintsDispatcher:767] 2018-04-09 19:05:24,874
>> HintsDispatchExecutor.java:283 - Finished hinted handoff of file
>> 68c7c130-6cf8-4864-bde8-1819f238045c-1523315072851-1.hints to endpoint
>> 68c7c130-6cf8-4864-bde8-1819f238045c, partially
>
> DEBUG [ReadRepairStage:82950] 2018-04-09 19:05:24,932
>> DataResolver.java:169 - Timeout while read-repairing after receiving all 1
>> data and digest responses
>
> ERROR [ReadRepairStage:82950] 2018-04-09 19:05:24,933
>> CassandraDaemon.java:229 - Exception in thread
>> Thread[ReadRepairStage:82950,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
>> - received only 0 responses.
>
> at 
> org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
>
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[na:1.8.0_144]
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[na:1.8.0_144]
>
> at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>> 

Re: Roadmap for 4.0

2018-04-10 Thread DuyHai Doan
> I'd like to see pluggable storage and transient replica tickets land, for
> starters.

So after all the fuss and scandal about incremental repair and MV not
stable and being downgraded to experimental, I would like to suggest that
those new features are also flagged as experimental for some time for the
community to use them extensively before being promoted as first class
features

Thoughts ?

On Mon, Apr 9, 2018 at 11:36 PM, Jeff Beck  wrote:

> If you are going to make 4 bigger as long as we call out that 3.11.x (or
> whatever) will keep getting patches for stability only that's all that's
> needed. We haven't gone to 3.x releases many places yet as we wait for a
> release that will be stable longer. Knowing 4 is going to be bigger I
> wouldn't want to see more feature releases in 3.x
>
> I wouldn't want to greatly slow features down if they require a major
> release and 5 is too far off.
>
> Jeff
>
> On Mon, Apr 9, 2018, 4:05 PM Josh McKenzie  wrote:
>
> > >
> > > If they're not close to finished now why even consider them for the 4.0
> > > release?
> >
> > Merging in major features at the end of a release cycle is not the path
> to
> > stability, imo.
> >
> > On Mon, Apr 9, 2018 at 4:56 PM, Jonathan Haddad 
> wrote:
> >
> > > There's always more stuff to try to shoehorn in.  We've done big
> releases
> > > with all the things, it never was stable.  We tried the opposite end of
> > the
> > > spectrum, release every month, that really wasn't great either.
> > Personally
> > > I'd be OK with stopping new features by the end of this month and
> aiming
> > to
> > > release a stable 4.0 when we agree we would be comfortable dogfooding
> it
> > in
> > > production at our own companies (in a few months), and aim for 4.1 (or
> > 5.0
> > > I don't want to bikeshed the version) for pluggable storage and
> transient
> > > replicas.  If they're not close to finished now why even consider them
> > for
> > > the 4.0 release?  They're so core they should be merged into trunk at
> the
> > > beginning of the cycle for the follow up release in order to get as
> much
> > > exposure as possible.
> > >
> > > Jon
> > >
> > > On Mon, Apr 9, 2018 at 1:46 PM Nate McCall  wrote:
> > >
> > > > > I'd like to see pluggable storage and transient replica tickets
> land,
> > > for
> > > > > starters.
> > > >
> > > > I think both those features are, frankly, necessary for our future.
> On
> > > > the other hand, they both have the following risks:
> > > > 1. core behavioral changes
> > > > 2. require changing a (relatively) large surface area of code
> > > >
> > > > We can aim to de-risk 4.0 by focusing on what we have now which is
> > > > solid repair and NIO internode (maybe we move the 4.0 branch timeline
> > > > up?), aiming for a 4.1 following soon-ish.
> > > >
> > > > Or we can go in eyes open and agree on a larger footprint 4.0.
> > > >
> > > > I'm on the fence, tbh (can't emphasize enough how big both those
> > > > features will be). I just want everyone to know what we are getting
> > > > into and that we are potentially impacting our goals of "stable" ==
> > > > "exciting."
> > > >
> > > > 
> -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
> >
>