Hi all,
Thanks for the responses, this was very helpful.
I don't know yet what the distribution of clicks and users will be, but I
expect to see a few users with an enormous amount of interactions and most
users having very few. The idea of doing some additional manual
partitioning, and then
Hi all,
I am designing an application that will capture time series data where we
expect the number of records per user to potentially be extremely high. I
am not sure if we will eclipse the max row size of 2B elements, but I
assume that we would not want our application to approach that size
Hi Gaurav,
I recommend you just run a MapReduce job for this computation.
Alternatively, you can look at the code for the C* MapReduce input format:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java
That should give you what you need
Hi mck,
I'm not familiar with this ticket, but my understanding was that
performance of Hadoop jobs on C* clusters with vnodes was poor because a
given Hadoop input split has to run many individual scans (one for each
vnode) rather than just a single scan. I've run C* and Hadoop in
production
, Feb 20, 2015 at 10:17 PM, Clint Kelly clint.ke...@gmail.com
wrote:
Hi all,
I read the DSE 4.6 documentation and I'm still not 100% sure what a
mixed workload Cassandra + Spark installation would look like, especially
on AWS. What I gather is that you use OpsCenter to set up the following
Hi all,
I am building an application that keeps a time-series record of clickstream
data (clicks, impressions, etc.). The data model looks something like:
CREATE TABLE clickstream (
userid text,
event_time timestamp,
interaction frozen interaction_type,
PRIMARY KEY (userid, timestamp)
)
Hi all,
I read the DSE 4.6 documentation and I'm still not 100% sure what a mixed
workload Cassandra + Spark installation would look like, especially on
AWS. What I gather is that you use OpsCenter to set up the following:
- One virtual data center for real-time processing (e.g., ingestion
mean that paying a
small efficiency cost when reading data out of Cassandra initially might
not be the end of the world (especially given the benefits of using vnodes).
On Fri, Feb 20, 2015 at 8:29 AM, Clint Kelly clint.ke...@gmail.com wrote:
Hi Mark,
Thanks for your reply. That makes sense. I
Hi all,
I am trying to follow the instructions here for installing DSE 4.6 on AWS:
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMIOpsc.html
I was successful creating a single-node instance running OpsCenter, which I
intended to bootstrap
:36 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi all,
I am trying to follow the instructions here for installing DSE 4.6 on AWS:
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMIOpsc.html
I was successful creating a single-node instance
and that type of workload. This is by no
means a warning for users to disable vnodes on their Real-Time/Transactional
Cassandra only clusters on EC2.
I've used vnodes on EC2 without issue.
Regards,
Mark
On 20 February 2015 at 05:08, Clint Kelly clint.ke...@gmail.com wrote:
Hi all
Hi all,
The guide for installing Cassandra on EC2 says that
Note: The DataStax AMI does not install DataStax Enterprise nodes
with virtual nodes enabled.
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMI.html
Just curious why this is the case.
FWIW increasing the threshold for withMaxSchemaAgreementWaitSeconds to
30sec was enough to fix my problem---I would like to understand
whether the cluster has some kind of configuration problem that made
doing so necessary, however.
Thanks!
On Tue, Feb 3, 2015 at 7:44 AM, Clint Kelly clint.ke
Hi all,
I have an application that uses the Java driver to create a table and then
immediately write to it. I see the following warning in my logs:
[10.241.17.134] out: 15/02/03 09:32:24 WARN
com.datastax.driver.core.Cluster: No schema agreement from live replicas
after 10 s. The schema may not
Hi all,
I'd like to write some tests for my code that uses the Cassandra Java
driver to see how it behaves if there is a read timeout while accessing
Cassandra. Is there a best-practice for getting this done? I was thinking
about adjusting the settings in the cluster builder to adjust the
If I run this tool on a given host, it shows me stats for only the cases
where that host was the coordinator node, correct?
Is there any way (other than me cooking up a little script) to
automatically get the proxyhistogram stats for my entire cluster?
-Clint
, 2014, at 8:48 PM, Robert Coli rc...@eventbrite.com wrote:
On Wed, Nov 19, 2014 at 3:22 PM, Clint Kelly clint.ke...@gmail.com
wrote:
Is there any way (other than me cooking up a little script) to
automatically get the proxyhistogram stats for my entire cluster?
OpsCenter might expose
Hi all,
Over what time range does nodetool cfhistograms operate?
I am using Cassandra 2.0.8.39.
I am trying to debug some very high 95th and 99th percentile read
latencies in an application that I'm working on.
I tried running nodetool cfhistograms to get a flavor for the
distribution of read
Hi all,
I am trying to debug some high-latency outliers (99th percentile) in an
application I'm working on. I thought that I could turn on route tracing,
print the route traces to logs, and then examine my logs after a load test
to find the highest-latency paths and figure out what is going on.
shown the latencies within a single host, or are
they the end-to-end latencies from the coordinator node? -- cfhistograms
shows metrics at table/node level, proxyhistograms shows metrics at
cluster/coordinator level
On Sun, Nov 16, 2014 at 10:31 PM, Clint Kelly clint.ke...@gmail.com
wrote
Hi all,
I often have problems with code that I write that uses the DataStax Java
driver to create / modify a keyspace or table and then soon after reads the
metadata for the keyspace to verify that whatever changes I made the
keyspace or table are complete.
As an example, I may create a table
Hi all,
TL;DR - I think my unit tests are sometimes failing because of read
timeouts to an EmbeddedCassandraService when dropping a table triggers a
compaction on a highly-loaded build slave. Does this sound reasonable?
What options should I change in my Cluster.Builder (or elsewhere) to
prevent
://github.com/Mishail/CqlJmeter
-M
On 8/17/14 12:26, Clint Kelly wrote:
Hi all,
Is there a way to use the cassandra-stress tool with clustering columns?
I am trying to figure out whether an application that I'm running on
is slow because of my application logic, C* data model, or underlying
plugin may be useful in the latter case.
https://github.com/Mishail/CqlJmeter
-M
On 8/17/14 12:26, Clint Kelly wrote:
Hi all,
Is there a way to use the cassandra-stress tool with clustering
columns?
I am trying to figure out whether an application that I'm running
Hi all,
Is there a way to use the cassandra-stress tool with clustering columns?
I am trying to figure out whether an application that I'm running on
is slow because of my application logic, C* data model, or underlying
C* setup (e.g., I need more nodes or to tune some parameters).
My
columns make a big difference in
write performance?
On Sun, Aug 17, 2014 at 12:26 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi all,
Is there a way to use the cassandra-stress tool with clustering columns?
I am trying to figure out whether an application that I'm running on
is slow because
at the
configuration options available to the datastax-agent see this page:
datastax.com/documentation/opscenter/5.0/opsc/configure/agentAddressConfiguration.html
Mark
On Fri, Aug 15, 2014 at 3:32 AM, Clint Kelly clint.ke...@gmail.com wrote:
Hi all,
I just installed DataStax Enterprise 4.5. I
Hi all,
I just installed DataStax Enterprise 4.5. I installed OpsCenter
Server on one of my four machines. The port that OpsCenter usually
uses () was used by something else, so I modified
/usr/share/opscenter/conf/opscenterd.conf to set the port to 8889.
When I log into OpsCenter, it says
the shutdown lines in at least an hour before..
We're using C* 2.0.9.
On Thu, Aug 7, 2014 at 12:49 AM, Clint Kelly clint.ke...@gmail.com wrote:
Hi Rob,
Thanks for the clarification; this is really useful. I'll run some
experiments to see if the problem is a JVM OOM on our build machine
Hi Duncan,
Thanks for your help.
I am at a loss as to what is causing this process to stop then. I
would not expect the Cassandra process to finish until my code calls
Process#destroy, but it seems to non-deterministically stop much
earlier sometimes.
FWIW I have seen failures on another
Hi Rob,
Thanks for the clarification; this is really useful. I'll run some
experiments to see if the problem is a JVM OOM on our build machine.
Best regards,
Clint
On Wed, Aug 6, 2014 at 1:14 PM, Robert Coli rc...@eventbrite.com wrote:
On Wed, Aug 6, 2014 at 1:12 PM, Robert Coli
Hi all,
Allow me to rephrase a question I asked last week. I am performing some
queries with ALLOW FILTERING and getting consistent read timeouts like the
following:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency ONE (1 responses
for your help!
Best regards,
Clint
On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli rc...@eventbrite.com wrote:
On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote:
Allow me to rephrase a question I asked last week. I am performing some
queries with ALLOW FILTERING and getting
, Clint Kelly clint.ke...@gmail.com wrote:
Hi Rob,
Thanks for your feedback. I understand that use of ALLOW FILTERING is
not a best practice. In this case, however, I am building a tool on
top of Cassandra that allows users to sometimes do things that are
less than optimal. When they try to do
Hi everyone,
For some integration tests, we start up a CassandraDaemon in a
separate process (using the Java 7 ProcessBuilder API). All of my
integration tests run beautifully on my laptop, but one of them fails
on our Jenkins cluster.
The failing integration test does around 10k writes to
regards,
Clint
On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton bur...@spinn3r.com wrote:
If there is an oom it will be in the logs.
On Aug 5, 2014 8:17 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi everyone,
For some integration tests, we start up a CassandraDaemon in a
separate process
: Saturday, August 2, 2014 7:04 AM
To: user@cassandra.apache.org
Subject: Re: Occasional read timeouts seen during row scans
Hi Clint, is time correctly synchronized between your nodes?
Ciao, Duncan.
On 02/08/14 02:12, Clint Kelly wrote:
BTW a few other details, sorry for omitting
Hi everyone,
I am seeing occasional read timeouts during multi-row queries, but I'm
having difficulty reproducing them or understanding what the problem
is.
First, some background:
Our team wrote a custom MapReduce InputFormat that looks pretty
similar to the DataStax InputFormat except that it
was
observing the timeout)
Best regards,
Clint
On Fri, Aug 1, 2014 at 5:02 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi everyone,
I am seeing occasional read timeouts during multi-row queries, but I'm
having difficulty reproducing them or understanding what the problem
is.
First, some
Hi Tyler,
FWIW I was not able to reproduce this problem with a smaller example. I'll
go ahead and file the JIRA anyway. Thanks for your help!
Best regards,
Clint
On Thu, Jul 17, 2014 at 3:05 PM, Tyler Hobbs ty...@datastax.com wrote:
On Thu, Jul 17, 2014 at 4:59 PM, Clint Kelly clint.ke
JIRA, correct?
Best regards,
Clint
On Wed, Jul 16, 2014 at 4:32 PM, Tyler Hobbs ty...@datastax.com wrote:
On Tue, Jul 15, 2014 at 1:40 PM, Clint Kelly clint.ke...@gmail.com wrote:
Is there some way to get the driver to block until the schema code has
propagated everywhere? My currently
Hi everyone,
I am trying to design a schema that will keep the N-most-recent
versions of a value. Currently my table looks like the following:
CREATE TABLE foo (
rowkey text,
family text,
qualifier text,
version long,
value blob,
PRIMARY KEY (rowkey, family, qualifier,
5 seconds
This loop took three iterations to create the index.
Is this expected? This seems really weird!
Best regards,
Clint
On Mon, Jul 14, 2014 at 5:54 PM, Clint Kelly clint.ke...@gmail.com wrote:
BTW I have seen this using versions 2.0.1 and 2.0.3 of the java driver
on a three-node
, 2014 at 11:32 AM, DuyHai Doan doanduy...@gmail.com wrote:
As far as I know, schema propagation always takes some times in the cluster.
On this mailing list some people in the past faced similar behavior.
On Tue, Jul 15, 2014 at 8:20 PM, Clint Kelly clint.ke...@gmail.com wrote:
FWIW I was able
Hi everyone,
I have some code that I've been fiddling with today that uses the
DataStax Java driver to create a table and then create a secondary
index on a column in that table. I've testing this code fairly
thoroughly on a single-node Cassandra instance on my laptop and in
unit test (using the
BTW I have seen this using versions 2.0.1 and 2.0.3 of the java driver
on a three-node cluster with DSE 4.5.
On Mon, Jul 14, 2014 at 5:51 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi everyone,
I have some code that I've been fiddling with today that uses the
DataStax Java driver to create
Hi everyone,
Apologies if this is the incorrect forum for a question like this.
I am going to set up a mixed-workload (real-time and analytics)
installation of DSE 4.5 using bring-your-own Hadoop (BYOH). We are
using CDH 5.0.
I was reviewing the installation instructions, and I came across the
, you shouldn't enable vnodes on any Cassandra/DSE
datacenter that is doing hadoop analytics workloads. Other DCs in the
cluster can use vnodes.
-Tupshin
On Jul 2, 2014 5:50 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi everyone,
Apologies if this is the incorrect forum for a question
this would be an increase of several orders of magnitude in
the number of input splits.)
Best regards,
Clint
On Wed, Jul 2, 2014 at 6:04 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi Tupshin,
Thanks for the quick reply. Is the performance concern from the
Hadoop integration needing to set up
%3A%22apache-cassandra%22
On 05/20/2014 05:30 PM, Clint Kelly wrote:
Hi all,
I am using the maven assembly plugin to build a project that contains
a development environment for a project that we've built at work on
top of Cassandra. I'd like this development environment to include
Thanks, Lewis. I created a ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-7283
For now I just copied the cassandra and cassandra.in.sh scripts
into my project, along with custom configuration files. We already
have all of the necessary JARs in our project's lib directory, since
Hi all,
I am using the maven assembly plugin to build a project that contains
a development environment for a project that we've built at work on
top of Cassandra. I'd like this development environment to include
the latest release of Cassandra.
Is there a maven repo anywhere that contains an
Hi Anton,
One approach you could look at is to write a custom InputFormat that
allows you to limit the token range of rows that you fetch (if the
AbstractColumnFamilyInputFormat does not do what you want). Doing so
is not too much work.
If you look at the class RowIterator within
Hi everyone,
I couple of months ago I started working on a new Hadoop InputFormat
that we needed for something at my work. It is in a semi-working
state now so I thought I would post a link in case anyone is
interested:
https://github.com/wibiclint/cassandra2-hadoop2
At the time I started
...@gmail.com wrote:
Hello Clint
Why do you need to remove all SSTables or dropping keyspace between tests
? Truncating tables is not enough to have clean and repeatable tests ?
Regards
Duy Hai DOAN
On Thu, May 1, 2014 at 5:54 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi,
I am deleting
the
SSTables between tests ? I'm using extensively the same infrastructure than
the EmbeddedCassandraService with Achilles and I have no such issue so far
Regards
On Wed, Apr 30, 2014 at 8:43 PM, Clint Kelly clint.ke...@gmail.comwrote:
Hi all,
I have a unit test framework
Hi all,
I have a unit test framework for a Cassandra project that I'm working on.
For every one of my test classes, I delete all of the data file, commit
log, and saved cache locations, start an EmbeddedCassandraService, and
populate a keyspace and tables from scratch.
Currently, the unit tests
Hi everyone,
Is there a way to change the partitioner on a per-table or per-keyspace
basis?
We have some tables for which we'd like to enable ordered scans of rows, so
we'd like to use the ByteOrdered partitioner for those, but use Murmur3 for
everything else in our cluster.
Is this possible?
the inconsistency you think
you found is because the first and second queries went to different nodes.
the java driver will connect to all nodes and load balance requests by
default.
T#
On Mon, Mar 31, 2014 at 4:06 AM, Clint Kelly clint.ke...@gmail.com wrote:
BTW one other thing that I have
[:ipaddress] is
equal 10.0.2.15 hence your broadcast_address.
You can setup networking in different way or setup attribute
node[:cassandra][:broadcast_address] manually.
On Mon, Mar 31, 2014 at 3:03 AM, Clint Kelly clint.ke...@gmail.com wrote:
All,
Has anyone used the Cassandra Chef
Hi all,
I am working on a Hadoop InputFormat implementation that uses only the
native protocol Java driver and not the Thrift API. I am currently trying
to replicate some of the behavior of
*Cassandra.client.describe_ring(myKeyspace)* from the Thrift API. I would
like to do the following:
All,
Has anyone used the Cassandra Chef cookbook
https://github.com/michaelklishin/cassandra-chef-cookbook and seen
broadcast_address: 10.0.2.15 in /etc/cassandra/cassandra.yaml? I looked
through the source code for the cookbook and I have no idea how this is
happening.
I was able to fix this
, Mar 30, 2014 at 4:51 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi all,
I am working on a Hadoop InputFormat implementation that uses only the
native protocol Java driver and not the Thrift API. I am currently trying
to replicate some of the behavior of
*Cassandra.client.describe_ring
All,
I have a question about how to use the EmbeddedCassandraService in unit
tests. I wrote a short collection of unit tests here:
https://github.com/wibiclint/cassandra-java-driver-keyspaces
I'm trying to start up a new EmbeddedCassandraService for each unit test.
I looked at the Cassandra
Folks,
Can anyone instruct me about how to set up a maven project that depends on
either 2.0.6 or 2.1? I am interested in using some of the new features
(e.g., static columns) in my current project. Being able to just install
one of these versions in my local maven repository would be good
' ALLOW FILTERING
On Fri, Feb 28, 2014 at 6:57 AM, Clint Kelly clint.ke...@gmail.comwrote:
All,
Is there any way to have inequalities comparisons on multiple clustering
columns in a WHERE clause in CQL? For example, I'd like to do:
select * from foo where fam = 'Info' and qual 'A' and qual 'D
, this is done in parallel from the
get-go. Fewer hops. Less load on the coordinator. No bottlenecks. And with
a stored procedure, very very little additional overhead to the client,
server, or network.
-Tupshin
On Tue, Feb 25, 2014 at 7:48 PM, Clint Kelly clint.ke...@gmail.comwrote:
Hi everyone
27, 2014 at 1:00 AM, Clint Kelly clint.ke...@gmail.comwrote:
Hi all,
Is there any way to use the DataStax Java driver to combine multiple
SELECT statements into a single RPC? I assume not (I could not find
anything about this in the documentation), but I just wanted to check.
The short
to
indicate (to our software that sits on top of C*) that they are going to
use paging, and then we are going to be doing multiple client / server
operations anyway. I'd just like to minimize them. :)
Best regards,
Clint
On Fri, Feb 28, 2014 at 9:47 AM, Clint Kelly clint.ke...@gmail.com wrote:
Hi
Hi everyone,
I've been working on a rewrite of the Cassandra InputFormat for Hadoop 2
using the DataStax Java driver instead of the Thrift API.
I have a prototype working now, but there is one bit of code that I have
not been able to replace with code for the Java driver. In the
Great, thanks!
On Fri, Feb 28, 2014 at 4:38 PM, Tyler Hobbs ty...@datastax.com wrote:
On Fri, Feb 28, 2014 at 6:32 PM, Clint Kelly clint.ke...@gmail.comwrote:
What is the best known method for resetting a counter in CQL? Is it best
to read the counter and then increment it by a negative
Ah never mind, I see, currently you can refer to the ?'s by name by using
the name of the column to which the ? refers. And this works as long as
each column is present only one in the statement.
Sorry for the extra list traffic!
On Thu, Feb 27, 2014 at 7:33 PM, Clint Kelly clint.ke
All,
Is there any way to have inequalities comparisons on multiple clustering
columns in a WHERE clause in CQL? For example, I'd like to do:
select * from foo where fam = 'Info' and qual 'A' and qual 'D' and
version 2013 ALLOW FILTERING;
I get an error:
Bad Request: PRIMARY KEY part
:20 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi Tupshin,
Thanks for your help! Unfortunately in my case, I will need to do a
compare and set in which the compare is against a value in a dynamic column.
In general, I need to be able to do the following:
- Check whether a given value
Hi all,
Is there any way to use the DataStax Java driver to combine multiple SELECT
statements into a single RPC? I assume not (I could not find anything
about this in the documentation), but I just wanted to check.
Thanks!
Best regards,
Clint
column (coming with 2.0.6) as your conditional flag, as that column is
shared by all rows in the partition.
-Tupshin
On Mon, Feb 24, 2014 at 3:57 PM, Clint Kelly clint.ke...@gmail.comwrote:
Hi Tupshin,
Thanks for your help; I appreciate it.
Could I do something like the following?
Given
On Feb 25, 2014, at 7:49 PM, Clint Kelly clint.ke...@gmail.com wrote:
Hi everyone,
Let's say that I have a table that looks like the following:
CREATE TABLE time_series_stuff (
key text,
family text,
version int,
val text,
PRIMARY KEY (key, family, version
The Resolution status of the JIRA is set to Later, probably the
implementation is not done yet. The JIRA was opened to discuss about impl
strategy but nothing has been coded so far I guess.
On Sat, Feb 22, 2014 at 12:02 AM, Clint Kelly clint.ke...@gmail.com
wrote:
Folks,
Does anyone know how I can
Folks,
Does anyone know how I can modify multiple rows at once in a
lightweight transaction in CQL3?
I saw the following ticket:
https://issues.apache.org/jira/browse/CASSANDRA-5633
but it was not obvious to me from the comments how (or whether) this
got resolved. I also couldn't find
Folks,
Is there a recommended way to perform lots of INSERT operations in a
row when using the DataStax Java driver?
I notice that the RecordWriter for the CQL3 Hadoop implementation in
Cassandra does some per-data-node buffering of CQL3 queries. The
DataStax Java driver, on the other hand,
Java driver? -- Yes, use UNLOGGED batches. More
info here:
http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/batch_r.html
On Sat, Feb 8, 2014 at 10:19 PM, Clint Kelly clint.ke...@gmail.com wrote:
Folks,
Is there a recommended way to perform lots of INSERT
Folks,
Is there any way to perform a delete in CQL of all rows where a
particular columns (that is part of the primary key) is less than a
certain value? I believe that the corresponding SELECT statement
works, as in this example:
cqlsh:fiddle describe table foo;
CREATE TABLE foo (
key text,
, at 19:10, Clint Kelly clint.ke...@gmail.com wrote:
Folks,
Has anyone out there used Cassandra 2.0 with Hadoop 2.x? I saw this
discussion on the Cassandra JIRA:
https://issues.apache.org/jira/browse/CASSANDRA-5201
but the fix referenced
(https://github.com/michaelsembwever
Folks,
Has anyone out there used Cassandra 2.0 with Hadoop 2.x? I saw this
discussion on the Cassandra JIRA:
https://issues.apache.org/jira/browse/CASSANDRA-5201
but the fix referenced
(https://github.com/michaelsembwever/cassandra-hadoop) is for
Cassandra 1.2.
I put together a similar
84 matches
Mail list logo