Hi,
while curious on the new incremental repairs I updated our cluster to C*
version 2.1.2 via the Debian apt-repository. Everything went quite well,
but trying to start the tools sstablemetadata and sstablerepairedset
lead to the following error:
root@a01:/home/ifjke#
Many thanks for information Dennis and Karl.
I don’t think I can test until Monday, but I will let you know what (hopefully)
works.
Regards
Nigel
From: d...@aegisco.com [mailto:d...@aegisco.com]
Sent: 17 December 2014 22:31
To: user@cassandra.apache.org
Subject: Re: Cassandra metrics Graphite
I'm not sure that'll work with that many version moves in the middle,
upgrades are to my knowledge only tested between specific steps, namely
from 1.2.9 to the latest 2.0.x
http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html
Specifically:
Cassandra 2.0.x
why auto_bootstrap=false? The documentation even suggests the opposite. If
you don't auto_bootstrap the node will take queries before it has copies of
all the data, and you'll get the wrong answer (it'd not be unlike using CL
ONE when you've got a bunch of dropped mutations on a single node in the
I'd argue the higher latency for reads than HBase, I'm not sure of what
experience you have with both, and that may have been true at one point,
but with Leveled Compaction Strategy and proper JVM tunings I'm not sure
how this is true, it would at least be comparable. I've worked with buffer
that depends on what you mean by real-time analytics.
For things like continuous data streams, neither are appropriate platforms
for doing analytics. They're good for storing the results (aka output) of
the streaming analytics. I would suggest before you decide cassandra vs
hbase, first figure
Since Ajay is already using spark the Spark Cassandra Connector really gets
them where they want to be pretty easily
https://github.com/datastax/spark-cassandra-connector (joins, etc).
As far as spark streaming having basic support I'd challenge that
assertion (namely Storm has a number of
some of the most common types of use cases in stream processing is sliding
windows based on time or count. Based on my understanding of spark
architecture and spark streaming, it does not provide the same
functionality. One can fake it by setting spark streaming to really small
micro-batches, but
I'll decline to continue the commentary on spark, as again this probably
belongs on another list, other than to say, microbatches is an intentional
design tradeoff that has notable benefits for the same use cases you're
referring too, and that while you may disagree with those tradeoffs, it's a
for the record I think spark is good and I'm glad we have options.
my point wasn't to bad mouth spark. I'm not comparing spark to storm at
all, so I think there's some confusion here. I'm thinking of espers,
streambase, and other stream processing products. My point is to think
about the problems
Hi,
I am occasionally seeing:
WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line
225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see
tombstone_warn_threshold). 5001 columns was requested,
slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-],
My mistake on Storm, and I'm certain there are a number of use cases where
you're right Spark isn't the right answer, but I'd argue your treating it
like 0.5 Spark feature set wise instead of 1.1 Spark.
As for filtering before persistence..this is the common use case for spark
streaming and I've
Thanks Ryan and Peter for the suggestions.
Our requirement(an ecommerce company) at a higher level is to build a
Datawarehouse as a platform or service(for different product teams to
consume) as below:
Datawarehouse as a platform/service
|
Spark SQL
in the interest of knowledge sharing on the general topic of stream
processing. the domain is quite old and there's a lot of existing
literature.
within this space there are several important factors which many products
don't address:
temporal windows (sliding windows, discrete windows, dynamic
by data warehouse, what kind do you mean?
is it the traditional warehouse where people create multi-dimensional cubes?
or is it the newer class of UI tools that makes it easier for users to
explore data and the warehouse is mostly a denormalized (ie flattened)
format of the OLTP?
or is it a
Hi Peter,
You are right.The idea is to directly query the data from No SQL, in our
case via Spark SQL on Spark (as largely Spark support
Mongo/Cassandra/HBase/Hadoop). As you said, the business users still need
to query using Spark SQL. We are already using No SQL BI tools like Pentaho
(which
Almost every stream processing system I know of offers joins out of the box and
has done so for years
Even open source offerings like Esper have offered joins for years.
What hasnt are systems like storm, spark, etc which I dont really classify as
stream processors anyway.
--
Colin
@Colin -
I bounce back and forth on classifying storm and spark as stream processing
frameworks. Clearly they are marketed as stream processing frameworks and
they can process data streams. Even with the commercial stream processing
products, expressing joins with some of the products is a bit
Hi all,
We have a situation where some of our nodes have smaller disks and we would
like to align all nodes by replacing the smaller disks to bigger ones
without replacing nodes.
We don't have enough space to put data on / disk and copy it back to the
bigger disks so we would like to rebuild the
Hi Or,
You don't have another machine on the network that would temporarily be
able to host your /var/lib/cassandra content? That way you would simply be
scp:ing the files temporarily to another machine and copy them back when
done. You obviously want to do a repair afterwards just in case, but
I'd consider solving your root problem of people are starting and stopping
servers in prod accidentally instead of making Cassandra more difficult to
manage operationally.
On Thu Dec 18 2014 at 4:04:34 AM Ryan Svihla rsvi...@datastax.com wrote:
why auto_bootstrap=false? The documentation even
On Mon, Dec 15, 2014 at 12:41 AM, Mathijs Vogelzang math...@apptornado.com
wrote:
Would it be possible to trigger a manual partial compaction, to first
compact 4x 256 tables? Could this be added to nodetool if it doesn't exist
already?
JMX call forceUserDefinedCompaction.
=Rob
On Wed, Dec 17, 2014 at 7:04 PM, Kevin Burton bur...@spinn3r.com wrote:
I’m trying to figure out the best way to bootstrap our nodes.
I *think* I want our nodes to be manually bootstrapped. This way an admin
has to explicitly bring up the node in the cluster and I don’t have to
worry about
On Tue, Dec 16, 2014 at 12:38 AM, Jonas Borgström jo...@borgstrom.se
wrote:
That said, I've done some testing and it appears to be possible to
perform an in place conversion as long as all nodes contain all data (3
nodes and replication factor 3 for example) like this:
I would expect this to
V
On Dec 4, 2014 11:14 PM, Philo Yang ud1...@gmail.com wrote:
Hi,all
I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
gc that sometime there may be one or two nodes full gc more than one time
per minute and over 10 seconds each time, then the node will be unreachable
This topic comes up quite a bit. Enough, in fact, that I've done a 1 hour
webinar on the topic. I cover how the JVM GC works and things you need to
consider when tuning it for Cassandra.
https://www.youtube.com/watch?v=7B_w6YDYSwA
With your specific problem - full GC not reducing the old gen -
Hi Folks,
Have any of you come across blogs that describe how companies in the
industry are using Cassandra counters practically.
Thanks in advance.
Regards,
Rajath
Rajath Subramanyam
Here's one from Twitter...
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
On Thu, Dec 18, 2014 at 6:08 PM, Rajath Subramanyam rajat...@gmail.com
wrote:
Hi Folks,
Have any of you come across blogs that describe how companies in the
industry are using
Thanks Ken. Any other use cases where counters are used apart from Rainbird
?
Rajath Subramanyam
On Thu, Dec 18, 2014 at 5:12 PM, Ken Hancock ken.hanc...@schange.com
wrote:
Here's one from Twitter...
do you have to replace those disks? can you simply add new disks to those
nodes and configure C* to use JBOD?
On Dec 18, 2014 10:18 AM, Or Sher or.sh...@gmail.com wrote:
Hi all,
We have a situation where some of our nodes have smaller disks and we
would like to align all nodes by replacing
All,
We have a Cassandra cluster which seems to be struggling a bit. I have one node
which crashes continually, and others which crash sporadically. When they crash
it's with a JVM couldn't allocate memory, even though there's heaps available.
I suspect it's because one table which is very
i just have read this benchmark pdf, does anyone have some opinion about this?
i think it's not fair about cassandra
url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
http://msrg.utoronto.ca/papers/NoSQLBenchmark
Hi Or,
I did some sort of this a while ago. If your machines do have a free
disk slot - just put another disk there and use it as another
data_file_directory.
If not - as in my case:
- grab an usb dock for disks
- put the new one in there, plug in, format, mount to /mnt etc.
- I did an
Hi,
I'm always interessted in such benchmark experiments, because the
databases evolve so fast, that the race is always open and there is a
lot motion in there.
And of course I askes myself the same question. And I think that this
publication is unreliable. For 4 reasons (from reading very fast,
34 matches
Mail list logo