[ANNOUNCE] Apache Gora 0.8 Release

2017-09-20 Thread lewis john mcgibbney
Hi Folks,

The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.8.

The Apache Gora open source framework provides an in-memory data model and
persistence for big data. Gora supports persisting to

   - column stores,
   - key value stores,
   - document stores,
   - distributed in-memory key/value stores,
   - in-memory data grids,
   - in-memory caches,
   - distributed multi-model stores, and
   - hybrid in-memory architectures

Gora also enables analysis of data with extensive Apache Hadoop™ MapReduce
and Apache Spark™ support. Gora uses the Apache Software License v2.0.

Gora is released as both source code, downloads for which can be found at
our downloads page [0] as well as Maven artifacts which can be found on
Maven central [1].
The DOAP file for Gora can be found here [2]

This release addresses a modest 35 issues with the addition of new
datastore for OrientDB and Aerospike. The full Jira release report can be
found here [3].

Suggested Gora database support is as follows


   - Apache Avro  1.8.1
   - Apache Hadoop  2.5.2
   - Apache HBase  1.2.3
   - Apache Cassandra  3.11.0 (Datastax Java
   Driver 3.3.0)
   - Apache Solr  6.5.1
   - MongoDB  (driver) 3.5.0
   - Apache Accumlo  1.7.1
   - Apache Spark  1.4.1
   - Apache CouchDB  1.4.2 (test containers
    1.1.0)
   - Amazon DynamoDB  (driver) 1.10.55
   - Infinispan  7.2.5.Final
   - JCache  1.0.0 with Hazelcast
    3.6.4 support.
   - OrientDB  2.2.22
   - Aerospike  4.0.6


Thank you

Lewis

(on behalf of Gora PMC)

[0] http://gora.apache.org/downloads.html
[1] http://search.maven.org/#search|ga|1|g%3A%22org.apache.gora%22
[2] https://svn.apache.org/repos/asf/gora/committers/doap_Gora.rdf
[3] https://s.apache.org/3YdY

--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread kurt greaves
repair does overstream by design, so if that node is inconsistent you'd
expect a bit of an increase. if you've got a backlog of compactions that's
probably due to repair and likely the cause of the increase. if you're
really worried you can rolling restart to stop the repair, otherwise maybe
try increasing compaction throughput.


Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Just a quick additional note -- we have checked and this is the only node
in the cluster exhibiting this behavior, disk usage is steady on all the
others. CPU load on the repairing node is slightly higher but nothing
significant.

On Wed, Sep 20, 2017 at 9:08 PM, Paul Pollack 
wrote:

> Hi,
>
> I'm running a repair on a node in my 3.7 cluster and today got alerted on
> disk space usage. We keep the data and commit log directories on separate
> EBS volumes. The data volume is 2TB. The node went down due to EBS failure
> on the commit log drive. I stopped the instance and was later told by AWS
> support that the drive had recovered. I started the node back up and saw
> that it couldn't replay commit logs due to corrupted data, so I cleared the
> commit logs and then it started up again just fine. I'm not worried about
> anything there that wasn't flushed, I can replay that. I was unfortunately
> just outside the hinted handoff window so decided to run a repair.
>
> Roughly 24 hours after I started the repair is when I got the alert on
> disk space. I checked and saw that right before I started the repair the
> node was using almost 1TB of space, which is right where all the nodes sit,
> and over the course of 24 hours had dropped to about 200GB free.
>
> My gut reaction was that the repair must have caused this increase, but
> I'm not convinced since the disk usage doubled and continues to grow. I
> figured we would see at most an increase of 2x the size of an SSTable
> undergoing compaction, unless there's more to the disk usage profile of a
> node during repair. We use SizeTieredCompactionStrategy on all the tables
> in this keyspace.
>
> Running nodetool compactionstats shows that there are a higher than usual
> number of pending compactions (currently 20), and there's been a large one
> of 292.82GB moving slowly.
>
> Is it plausible that the repair is the cause of this sudden increase in
> disk space usage? Are there any other things I can check that might provide
> insight into what happened?
>
> Thanks,
> Paul
>
>
>


Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Hi,

I'm running a repair on a node in my 3.7 cluster and today got alerted on
disk space usage. We keep the data and commit log directories on separate
EBS volumes. The data volume is 2TB. The node went down due to EBS failure
on the commit log drive. I stopped the instance and was later told by AWS
support that the drive had recovered. I started the node back up and saw
that it couldn't replay commit logs due to corrupted data, so I cleared the
commit logs and then it started up again just fine. I'm not worried about
anything there that wasn't flushed, I can replay that. I was unfortunately
just outside the hinted handoff window so decided to run a repair.

Roughly 24 hours after I started the repair is when I got the alert on disk
space. I checked and saw that right before I started the repair the node
was using almost 1TB of space, which is right where all the nodes sit, and
over the course of 24 hours had dropped to about 200GB free.

My gut reaction was that the repair must have caused this increase, but I'm
not convinced since the disk usage doubled and continues to grow. I figured
we would see at most an increase of 2x the size of an SSTable undergoing
compaction, unless there's more to the disk usage profile of a node during
repair. We use SizeTieredCompactionStrategy on all the tables in this
keyspace.

Running nodetool compactionstats shows that there are a higher than usual
number of pending compactions (currently 20), and there's been a large one
of 292.82GB moving slowly.

Is it plausible that the repair is the cause of this sudden increase in
disk space usage? Are there any other things I can check that might provide
insight into what happened?

Thanks,
Paul


Re: Debugging write timeouts on Cassandra 2.2.5

2017-09-20 Thread Jai Bheemsen Rao Dhanwada
Apologies for the typo, Mike

On Wed, Sep 20, 2017 at 9:49 AM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello Nike,
>
> were you able to fix the issue? If so what change helped you?
>
> On Wed, Feb 24, 2016 at 5:36 PM, Jack Krupansky 
> wrote:
>
>> Great that you found a specific release that triggers the problem - 2.1.x
>> has a huge number of changes.
>>
>> How many partitions and rows do you have? What's the largest row count
>> for a single partition? And all of these CQL tables are COMPACT STORAGE,
>> correct? Are you writing a large number of skinny partitions or a smaller
>> number of very wide partitions? It wouldn't surprise me if behavior for
>> large partitions varies between releases since they can be so
>> memory-intensive.
>>
>> I see this change in 2.1.5 that could possibly introduce some memory
>> usage:
>> Write partition size estimates into a system table (CASSANDRA-7688)
>>
>> At this stage it would probably help for you to try to produce a
>> reasonably small repro test case that you could file as a Jira. And if you
>> could run that repro test case on 3.x to verify that the problem still
>> exists, that would be helpful as well.
>>
>> How long does it take to repro the timeout?
>>
>> Can you repro the timeout using a single node?
>>
>> What is the pattern of the timeouts - just random and occasional, or
>> heavy and continuous once they start?
>>
>> Are they occurring uniformly on all three nodes?
>>
>> If you bounce the cluster and continue testing, do the timeouts commence
>> immediately, fairly soon, or only after about as long as they take from a
>> clean fresh start?
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Feb 24, 2016 at 7:04 PM, Mike Heffner  wrote:
>>
>>> Nate,
>>>
>>> So we have run several install tests, bisecting the 2.1.x release line,
>>> and we believe that the regression was introduced in version 2.1.5. This is
>>> the first release that clearly hits the timeout for us.
>>>
>>> It looks like quite a large release, so our next step will likely be
>>> bisecting the major commits to see if we can narrow it down:
>>> https://github.com/apache/cassandra/blob/3c0a337ebc90b
>>> 0d99349d0aa152c92b5b3494d8c/CHANGES.txt. Obviously, any suggestions on
>>> potential suspects appreciated.
>>>
>>> These are the memtable settings we've configured diff from the defaults
>>> during our testing:
>>>
>>> memtable_allocation_type: offheap_objects
>>> memtable_flush_writers: 8
>>>
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>> On Fri, Feb 19, 2016 at 1:46 PM, Nate McCall 
>>> wrote:
>>>
 The biggest change which *might* explain your behavior has to do with
 the changes in memtable flushing between 2.0 and 2.1:
 https://issues.apache.org/jira/browse/CASSANDRA-5549

 However, the tpstats you posted shows no dropped mutations which would
 make me more certain of this as the cause.

 What values do you have right now for each of these (my recommendations
 for each on a c4.2xl with stock cassandra-env.sh are in parenthesis):

 - memtable_flush_writers (2)
 - memtable_heap_space_in_mb  (2048)
 - memtable_offheap_space_in_mb (2048)
 - memtable_cleanup_threshold (0.11)
 - memtable_allocation_type (offheap_objects)

 The biggest win IMO will be moving to offheap_objects. By default,
 everything is on heap. Regardless, spending some time tuning these for your
 workload will pay off.

 You may also want to be explicit about

 - native_transport_max_concurrent_connections
 - native_transport_max_concurrent_connections_per_ip

 Depending on the driver, these may now be allowing 32k streams per
 connection(!) as detailed in v3 of the native protocol:
 https://github.com/apache/cassandra/blob/cassandra-2.1/doc/
 native_protocol_v3.spec#L130-L152



 On Fri, Feb 19, 2016 at 8:48 AM, Mike Heffner  wrote:

> Anuj,
>
> So we originally started testing with Java8 + G1, however we were able
> to reproduce the same results with the default CMS settings that ship in
> the cassandra-env.sh from the Deb pkg. We didn't detect any large GC 
> pauses
> during the runs.
>
> Query pattern during our testing was 100% writes, batching (via Thrift
> mostly) to 5 tables, between 6-1500 rows per batch.
>
> Mike
>
> On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra  > wrote:
>
>> Whats the GC overhead? Can you your share your GC collector and
>> settings ?
>>
>>
>> Whats your query pattern? Do you use secondary indexes, batches, in
>> clause etc?
>>
>>
>> Anuj
>>
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner
>>  wrote:
>> 

Re: Debugging write timeouts on Cassandra 2.2.5

2017-09-20 Thread Jai Bheemsen Rao Dhanwada
Hello Nike,

were you able to fix the issue? If so what change helped you?

On Wed, Feb 24, 2016 at 5:36 PM, Jack Krupansky 
wrote:

> Great that you found a specific release that triggers the problem - 2.1.x
> has a huge number of changes.
>
> How many partitions and rows do you have? What's the largest row count for
> a single partition? And all of these CQL tables are COMPACT STORAGE,
> correct? Are you writing a large number of skinny partitions or a smaller
> number of very wide partitions? It wouldn't surprise me if behavior for
> large partitions varies between releases since they can be so
> memory-intensive.
>
> I see this change in 2.1.5 that could possibly introduce some memory usage:
> Write partition size estimates into a system table (CASSANDRA-7688)
>
> At this stage it would probably help for you to try to produce a
> reasonably small repro test case that you could file as a Jira. And if you
> could run that repro test case on 3.x to verify that the problem still
> exists, that would be helpful as well.
>
> How long does it take to repro the timeout?
>
> Can you repro the timeout using a single node?
>
> What is the pattern of the timeouts - just random and occasional, or heavy
> and continuous once they start?
>
> Are they occurring uniformly on all three nodes?
>
> If you bounce the cluster and continue testing, do the timeouts commence
> immediately, fairly soon, or only after about as long as they take from a
> clean fresh start?
>
>
>
> -- Jack Krupansky
>
> On Wed, Feb 24, 2016 at 7:04 PM, Mike Heffner  wrote:
>
>> Nate,
>>
>> So we have run several install tests, bisecting the 2.1.x release line,
>> and we believe that the regression was introduced in version 2.1.5. This is
>> the first release that clearly hits the timeout for us.
>>
>> It looks like quite a large release, so our next step will likely be
>> bisecting the major commits to see if we can narrow it down:
>> https://github.com/apache/cassandra/blob/3c0a337ebc90b0d99349d0aa152c92
>> b5b3494d8c/CHANGES.txt. Obviously, any suggestions on potential suspects
>> appreciated.
>>
>> These are the memtable settings we've configured diff from the defaults
>> during our testing:
>>
>> memtable_allocation_type: offheap_objects
>> memtable_flush_writers: 8
>>
>>
>> Cheers,
>>
>> Mike
>>
>> On Fri, Feb 19, 2016 at 1:46 PM, Nate McCall 
>> wrote:
>>
>>> The biggest change which *might* explain your behavior has to do with
>>> the changes in memtable flushing between 2.0 and 2.1:
>>> https://issues.apache.org/jira/browse/CASSANDRA-5549
>>>
>>> However, the tpstats you posted shows no dropped mutations which would
>>> make me more certain of this as the cause.
>>>
>>> What values do you have right now for each of these (my recommendations
>>> for each on a c4.2xl with stock cassandra-env.sh are in parenthesis):
>>>
>>> - memtable_flush_writers (2)
>>> - memtable_heap_space_in_mb  (2048)
>>> - memtable_offheap_space_in_mb (2048)
>>> - memtable_cleanup_threshold (0.11)
>>> - memtable_allocation_type (offheap_objects)
>>>
>>> The biggest win IMO will be moving to offheap_objects. By default,
>>> everything is on heap. Regardless, spending some time tuning these for your
>>> workload will pay off.
>>>
>>> You may also want to be explicit about
>>>
>>> - native_transport_max_concurrent_connections
>>> - native_transport_max_concurrent_connections_per_ip
>>>
>>> Depending on the driver, these may now be allowing 32k streams per
>>> connection(!) as detailed in v3 of the native protocol:
>>> https://github.com/apache/cassandra/blob/cassandra-2.1/
>>> doc/native_protocol_v3.spec#L130-L152
>>>
>>>
>>>
>>> On Fri, Feb 19, 2016 at 8:48 AM, Mike Heffner  wrote:
>>>
 Anuj,

 So we originally started testing with Java8 + G1, however we were able
 to reproduce the same results with the default CMS settings that ship in
 the cassandra-env.sh from the Deb pkg. We didn't detect any large GC pauses
 during the runs.

 Query pattern during our testing was 100% writes, batching (via Thrift
 mostly) to 5 tables, between 6-1500 rows per batch.

 Mike

 On Thu, Feb 18, 2016 at 12:22 PM, Anuj Wadehra 
 wrote:

> Whats the GC overhead? Can you your share your GC collector and
> settings ?
>
>
> Whats your query pattern? Do you use secondary indexes, batches, in
> clause etc?
>
>
> Anuj
>
>
> Sent from Yahoo Mail on Android
> 
>
> On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner
>  wrote:
> Alain,
>
> Thanks for the suggestions.
>
> Sure, tpstats are here: https://gist.github.com/
> mheffner/a979ae1a0304480b052a. Looking at the metrics across the
> ring, there were no blocked tasks nor dropped messages.
>
> Iowait metrics look 

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-20 Thread Jeff Jirsa
It certainly violates the principle of least astonishment. 

Generally, people with large clusters do it the same way they did in 2.1 - with 
ring aware scheduling (which people running large clusters can probably do 
because they’re less likely to be using vnodes)

The conversation beyond this belongs on the ticket - the three people most 
likely to fix it are aware (Paulo is on the thread, I CC’d Blake and Marcus on 
the ticket). Further commentary likely belongs on the ticket.

-- 
Jeff Jirsa


> On Sep 19, 2017, at 10:51 PM, Steinmaurer, Thomas 
>  wrote:
> 
> Hi,
>  
> no offense to anybody, but I would even say repair is broken in C* 3.0 (or 
> beginning with 2.2?). The need for adding some kind of third-party (reaper, 
> etc.) to the deployment is a smelly sign, that something does not work 
> out-of-the box. I have no idea how users e.g. with Clusters > 100 nodes do 
> handle repairs with 3.0 (2.2?), if it is not reliable to kick off repair on 
> several nodes in parallel.
>  
> In my humble opinion, if 3.0 is being classified as “production-ready”, this 
> needs immediate attention, even if this would mean some sort of backward 
> compatiblity break in a bug-fix release.
>  
> Just my 2 cents from someone having > 300 Cassandra 2.1 JVMs out there spread 
> around the world.
>  
> Thanks,
> Thomas
>  
> From: kurt greaves [mailto:k...@instaclustr.com] 
> Sent: Dienstag, 19. September 2017 23:54
> To: User 
> Subject: RE: Multi-node repair fails after upgrading to 3.0.14
>  
> You're right of course. Part of the reason it's changing so frequently is to 
> try and improve repairs so that they at least actually work reliably. C* 3 
> hasn't been the smoothest ride for repairs. Incremental repairs wasn't really 
> ready for 3.0 so it was a mistake to make it a default. 
> Unfortunately it's hard to change that back now as it will just lead to more 
> confusion and problems for users unaware of the change.
>  
> On 20 Sep. 2017 00:25, "Durity, Sean R"  wrote:
> Required maintenance for a cluster should not be this complicated and should 
> not be changing so often. To me, this is a major flaw in Cassandra.
>  
>  
> Sean Durity
>  
> From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] 
> Sent: Tuesday, September 19, 2017 2:33 AM
> To: user@cassandra.apache.org
> Subject: RE: Multi-node repair fails after upgrading to 3.0.14
>  
> Hi Kurt,
>  
> thanks for the link!
>  
> Honestly, a pity, that in 3.0, we can’t get the simple, reliable and 
> predictable way back to run a full repair for very low data volume CFs being 
> kicked off on all nodes in parallel, without all the magic behind the scene 
> introduced by incremental repairs, even if not used, as anticompaction even 
> with –full has been introduced with 2.2+ J
>  
>  
> Regards,
> Thomas
>  
> From: kurt greaves [mailto:k...@instaclustr.com] 
> Sent: Dienstag, 19. September 2017 06:24
> To: User 
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>  
> https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs 
> still triggers anti-compaction on non-repaired SSTables (if I'm reading that 
> right), so might need to make sure you don't run multiple repairs at the same 
> time across your nodes (if your using vnodes), otherwise could still end up 
> trying to run anti-compaction on the same SSTable from 2 repairs.
>  
> Anyone else feel free to jump in and correct me if my interpretation is wrong.
>  
> On 18 September 2017 at 17:11, Steinmaurer, Thomas 
>  wrote:
> Jeff,
>  
> what should be the expected outcome when running with 3.0.14:
>  
> nodetool repair –full –pr keyspace cfs
>  
> · Should –full trigger anti-compaction?
> 
> · Should this be the same operation as nodetool repair –pr keyspace 
> cfs in 2.1?
> 
> · Should I be able to  run this on several nodes in parallel as in 
> 2.1 without troubles, where incremental repair was not the default?
> 
>  
> Still confused if I’m missing something obvious. Sorry about that. J
>  
> Thanks,
> Thomas
>  
> From: Jeff Jirsa [mailto:jji...@gmail.com] 
> Sent: Montag, 18. September 2017 16:10
> 
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>  
> Sorry I may be wrong about the cause - didn't see -full
>  
> Mea culpa, its early here and I'm not awake
> 
> 
> -- 
> Jeff Jirsa
>  
> 
> On Sep 18, 2017, at 7:01 AM, Steinmaurer, Thomas 
>  wrote:
> 
> Hi Jeff,
>  
> understood. That’s quite a change then coming from 2.1 from an operational 
> POV.
>  
> Thanks again.
>  
> Thomas
>  
> From: Jeff Jirsa [mailto:jji...@gmail.com] 
> Sent: Montag, 18. September 2017 15:56
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>  
> The command you're running