[RELEASE] Apache Cassandra 3.0.24 released

2021-02-01 Thread Oleksandr Petrov
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.24.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.0.24
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.0.24
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.11.10 released

2021-02-01 Thread Oleksandr Petrov
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.11.10.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.10
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.10
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: SASI queries- cqlsh vs java driver

2019-02-05 Thread Oleksandr Petrov
Could you post full table schema (names obfuscated, if required) with index
creation statements and queries?

On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:

> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
> could try to use OR instead of IN?
>
> SELECT blah FROM foo WHERE  = :val1 OR  =
> :val2 ALLOW FILTERING
>
>
>
> It should be the same as IN query, but I don’t if it makes a difference
> for performance.
>
>
>
> *From: *Peter Heitman 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday 4 February 2019 at 07:17
> *To: *"user@cassandra.apache.org" 
> *Subject: *SASI queries- cqlsh vs java driver
>
>
>
> When I create a SASI index on a secondary column, from cqlsh I can execute
> a query
>
>
>
> SELECT blah FROM foo WHERE  IN ('mytext') ALLOW FILTERING;
>
>
>
> but not from the java driver:
>
>
>
> SELECT blah FROM foo WHERE  IN :val ALLOW FILTERING
>
>
>
> Here I get an exception
>
>
>
> com.datastax.driver.core.exceptions.InvalidQueryException: IN predicates
> on non-primary-key columns () is not yet supported
>
> at
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
> ~[cassandra-driver-core-3.6.0.jar:na]
>
>
>
> Why are they different? Is there anything I can do with the java driver to
> get past this exception?
>
>
>
> Peter
>
>
>
>
>


-- 
alex p


CASSANDRA-13004 FAQ

2017-07-25 Thread Oleksandr Petrov
Hi everyone,

There were many people asking similar questions about the CASSANDRA-13004.
It might be that the issue itself and release notes are somewhat hard to
grasp or might sound ambiguous, so here's a bit more elaborate explanation
what 13004 means in terms of the upgrade process, how it manifests and what
exactly to do:
https://gist.github.com/ifesdjeen/9cacb1ccd934374f707125d78f2fbcb6

If you're running 3.0+ Cassandra you might want to read it just in case.

If you have any proposals in terms of how to modify/improve it, please ping
me or comment on the gist, I'll make sure to adjust the text accordingly.

Best regards
-- 
Alex Petrov


Re: Overhead of data types in cassandra

2016-09-08 Thread Oleksandr Petrov
You can find the information about that in Cassandra source code, for
example. Search for serializers, like BytesSerializer:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/serializers/BytesSerializer.java
to
get an idea how the data is serialized.

But I'd also check out classes like Cell and SSTable structure to get an
overview on what's the data layout.

On Thu, Sep 8, 2016 at 4:23 AM Alexandr Porunov 
wrote:

> Hello,
>
> Where can I find information about overhead of data types in cassandra?
> I am interested about blob, text, uuid, timeuuid data types. Does a blob
> type store a value with the length of the blob data? If yes then which type
> of the length it is using (int, bigint)?
> If I want to store 80 bits how much of disk space will be used for it? If
> I want to store 64 bits is it better to use bigint?
>
> Sincerely,
> Alexandr
>
-- 
Alex Petrov


Re: Cassandra thrift frameSize issue

2016-07-20 Thread Oleksandr Petrov
The problem you're seeing is because of the thrift max_message_length,
which is set to 16mb and is not configurable from outside / in the yaml
file.

If the JDBC wrapper supports paging, you might want to look into
configuring it.

On Tue, Jul 19, 2016 at 8:27 PM Saurabh Kumar  wrote:

> Hi ,
>
> I am trying run query over Cassandra cluster using JDBC connection using
> RJdbc library of R language.  I am
> getting org.apache.thrift.transport.TTransportException excetion as
> mentioned in atttached pic.
>
> I increased frame size into cassandra.yaml like :
>
> thrift_framed_transport_size_in_mb: 700
> thrift_max_message_length_in_mb: 730
>
>
> Still getting same error.
>
> Please help me out to resolve this issue.
>
>
>
> Thanks in advance.
>
>
> Regards,
> Saurabh
>
-- 
Alex Petrov


Re: Setting bloom_filter_fp_chance < 0.01

2016-05-19 Thread Oleksandr Petrov
Bloom filters are used to avoid disk seeks on accessing sstables. As we
don't know where exactly the partition resides, we have to narrow down the
search to paticular sstables where the data most probably is.

Given that most likely you won't store 50B rows on the single node, you
will most likely have a larger cluster.

I would start with writing data (possibly test payloads) rather than tweak
params. More often than not such optimization a may have reverse effects.

better data modeling is extremely important though: picking right partition
key, having good clustering key layout will help to run queries in the most
performant way.

With regard to compaction strategy you may want to read up something
similar to
https://www.instaclustr.com/blog/2016/01/27/apache-cassandra-compaction/
where several compaction strategies are compared and explained in details.

Start with defaults, tweak params that make sense depending on read write
workloads under realistic stress test payloads, measure results and make
decisions. If there was a particular setting that would suddenly make
Cassandra much better and faster it would have been always on by default.
On Thu, 19 May 2016 at 16:55, Kai Wang  wrote:

> with 50 bln rows and bloom_filter_fp_chance = 0.01, bloom filter will
> consume a lot of off heap memory. You may want to take that into
> consideration too.
>
> On Wed, May 18, 2016 at 11:53 PM, Adarsh Kumar 
> wrote:
>
>> Hi Sai,
>>
>> We have a use case where we are designing a table that is going to have
>> around 50 billion rows and we require a very fast reads. Partitions are not
>> that complex/big, it has
>> some validation data for duplicate checks (consisting 4-5 int and
>> varchar). So we were trying various options to optimize read performance.
>> Apart from tuning Bloom Filter we are trying following thing:
>>
>> 1). Better data modelling (making appropriate partition and clustering
>> keys)
>> 2). Trying Leveled compaction (changing data model for this one)
>>
>> Jonathan,
>>
>> I understand that tuning bloom_filter_fp_chance will not have a drastic
>> performance gain.
>> But this is one of the many tings we are trying.
>> Please let me know if you have any other suggestions to improve read
>> performance for this volume of data.
>>
>> Also please let me know any performance benchmark technique (currently we
>> are planing to trigger massive reads from spark and check cfstats).
>>
>> NOTE: we will be deploying DSE on EC2, so please suggest if you have
>> anything specific to DSE and EC2.
>>
>> Adarsh
>>
>> On Wed, May 18, 2016 at 9:45 PM, Jonathan Haddad 
>> wrote:
>>
>>> The impact is it'll get massively bigger with very little performance
>>> benefit, if any.
>>>
>>> You can't get 0 because it's a probabilistic data structure.  It tells
>>> you either:
>>>
>>> your data is definitely not here
>>> your data has a pretty decent chance of being here
>>>
>>> but never "it's here for sure"
>>>
>>> https://en.wikipedia.org/wiki/Bloom_filter
>>>
>>> On Wed, May 18, 2016 at 11:04 AM sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
 hi Adarsh;
 were there any drawbacks to setting the bloom_filter_fp_chance  to
 the default value?

 thanks
 Sai

 On Wed, May 18, 2016 at 2:21 AM, Adarsh Kumar 
 wrote:

> Hi,
>
> What is the impact of setting bloom_filter_fp_chance < 0.01.
>
> During performance tuning I was trying to tune bloom_filter_fp_chance
> and have following questions:
>
> 1). Why bloom_filter_fp_chance = 0 is not allowed. (
> https://issues.apache.org/jira/browse/CASSANDRA-5013)
> 2). What is the maximum/recommended value of bloom_filter_fp_chance
> (if we do not have any limitation for bloom filter size).
>
> NOTE: We are using default SizeTieredCompactionStrategy on
> cassandra  2.1.8.621
>
> Thanks in advance..:)
>
> Adarsh Kumar
>


>>
> --
Alex Petrov


Re: tombstone_failure_threshold being ignored?

2016-05-03 Thread Oleksandr Petrov
If I understand the problem correctly, tombstone_failure_theshold is never
reached because the ~2M objects might have been collected for different
queries running in parallel, not for one query. Every separate query never
reached the threshold although all together they contributed to the OOM.

You can read a bit more about the anti-patterns (particularly, ones related
to workloads generating lots of tombstones):
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

You can also try running more frequent repair/compacts. Although I'd look
closer on the read queries first, possibly with tracing on, and check
parallelism for those. Maybe decrease warn level for tombstone thresholds
to understand where the bounds are.

On Thu, Apr 28, 2016 at 7:23 PM Rick Gunderson 
wrote:

> We are running Cassandra 2.2.3, 2 data centers, 3 nodes in each. The
> replication factor per datacenter is 3. The Xmx setting on the Cassandra
> JVMs is 4GB.
>
> We have a workload that generates loots of tombstones and Cassandra goes
> OOM in about 24 hours. We've adjusted the tombstone_failure_threshold down
> to 25000 but we never see the TombstoneOverwhelmingException before the
> nodes start going OOM.
>
> The table operation that looks to be the culprit is a scan of partition
> keys (i.e. we are scanning across narrow rows, not scanning within a wide
> row). The heapdump shows we have a RangeSliceReply containing an ArrayList
> with 1,823,230 org.apache.cassandra.db.Row objects with a retained heap
> size of 441MiB.  A look inside one of the Row objects shows an
> org.apache.cassandra.db.DeletionInfo object so I assume that means the row
> has been tombstoned.
>
> If all of the 1,823,239 Row objects are tombstoned (and it is likely that
> most of them are), is there a reason that the
> TombstoneOverwhelmingException never gets thrown?
>
>
>
> Regards,
>
> *Rick (R.) Gunderson *
> Software Engineer
> IBM Commerce, B2B Development - GDHA
> --
> [image: 2D barcode - encoded with contact information] *Phone: *1-250-220-1053
>
> *E-mail:* *rgunder...@ca.ibm.com* 
> *Find me on:* [image: LinkedIn:
> http://ca.linkedin.com/pub/rick-gunderson/0/443/241]
> 
> [image: IBM]
>
> 1803 Douglas St
> Victoria, BC V8T 5C3
> Canada
>
>
> --
Alex


Re: Seed Node OOM

2015-06-13 Thread Oleksandr Petrov
Sorry I completely forgot to mention it in an original message: we have
rather large commitlog directory (which is usually rather small), 8G of
commitlogs. Draining and flushing didn't help.

On Sat, Jun 13, 2015 at 1:39 PM, Oleksandr Petrov 
oleksandr.pet...@gmail.com wrote:

 Hi,

 We're using Cassandra, recently migrated to 2.1.6, and we're experiencing
 constant OOMs in one of our clusters.

 It's a rather small cluster: 3 nodes, EC2 xlarge: 2CPUs, 8GB RAM, set up
 with datastax AMI.

 Configs (yaml and env.sh) are rather default: we've changed only
 concurrent compactions to 2 (although tried 1, too), tried setting HEAP and
 NEW to different values, ranging from 4G/200 to 6G/200M.

 Write load is rather small: 200-300 small payloads (4 varchar fields as a
 primary key, 2 varchar fields and a couple of long/double fields), plus
 some larger (1-2kb) payloads with a rate of 10-20 messages per second.

 We do a lot of range scans, but they are rather quick.

 It kind of started overnight. Compaction is taking a long time. Other two
 nodes in a cluster behave absolutely normally: no hinted handoffs, normal
 heap sizes. There were no write bursts, no tables added no indexes changed.

 Anyone experienced something similar? Maybe any pointers?

 --
 alex p




-- 
alex p


Seed Node OOM

2015-06-13 Thread Oleksandr Petrov
Hi,

We're using Cassandra, recently migrated to 2.1.6, and we're experiencing
constant OOMs in one of our clusters.

It's a rather small cluster: 3 nodes, EC2 xlarge: 2CPUs, 8GB RAM, set up
with datastax AMI.

Configs (yaml and env.sh) are rather default: we've changed only concurrent
compactions to 2 (although tried 1, too), tried setting HEAP and NEW to
different values, ranging from 4G/200 to 6G/200M.

Write load is rather small: 200-300 small payloads (4 varchar fields as a
primary key, 2 varchar fields and a couple of long/double fields), plus
some larger (1-2kb) payloads with a rate of 10-20 messages per second.

We do a lot of range scans, but they are rather quick.

It kind of started overnight. Compaction is taking a long time. Other two
nodes in a cluster behave absolutely normally: no hinted handoffs, normal
heap sizes. There were no write bursts, no tables added no indexes changed.

Anyone experienced something similar? Maybe any pointers?

-- 
alex p


[ANN] Cassaforte 1.2.0 is released

2013-09-07 Thread Oleksandr Petrov
Cassaforte [1] is a Clojure client for Apache Cassandra 1.2+. It is built
around CQL 3
and focuses on ease of use. You will likely find that using Cassandra from
Clojure has
never been so easy.

1.2.0 is a minor release that introduces one minor feature, fixes a couple
of bugs, and
makes Cassaforte compatible with Cassandra 2.0.

Release notes:
http://blog.clojurewerkz.org/blog/2013/09/07/cassaforte-1-dot-2-0-is-released/

1. http://clojurecassandra.info/ http://clojurememcached.info/

--
Alex P

https://github.com/ifesdjeen
https://twitter.com/ifesdjeen


Re: CQL and IN

2013-07-08 Thread Oleksandr Petrov
Hi Tony, you can check out a guide here:
http://clojurecassandra.info/articles/kv.html which explains pretty most of
things you need to know about queries for starters.

It includes CQL code examples, just disregard Clojure ones, there's nothing
strictly Clojure-driver specific in that guide.


On Fri, Jul 5, 2013 at 12:18 AM, Rui Vieira ruidevie...@googlemail.comwrote:

 You can use the actual item_ids however,

 Select * from items Where item_id IN (1, 2, 3, ..., n)


 On 4 July 2013 23:16, Rui Vieira ruidevie...@googlemail.com wrote:

 CQL does not support sub-queries.


 On 4 July 2013 22:53, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,

 I am using the DataStax driver and got prepared to work. When I tried to
 use the IN keyword with a SQL it did not work. According to DataStax IN
 should work.

 So if I tried:

 Select * from items Where item_id IN (Select item_id FROM users where
 user_id = ?)


 Thanks for the feedback.
 -Tony






-- 
alex p


Re: Date range queries

2013-06-29 Thread Oleksandr Petrov
Maybe i'm a bit late to the party, but that can be still useful for
reference in future.

We've tried to keep documentation for Clojure cassandra driver as elaborate
and generic as possible, and it contains raw CQL examples,
so you can refer to the docs even if you're using any other driver.

Here's a Range Query guide:
http://clojurecassandra.info/articles/kv.html#toc_8 there's also
information about ordering a resultset,
One more thing that may be useful is Data Modelling guide here:
http://clojurecassandra.info/articles/data_modelling.html#toc_2 which
describes usage of compound keys (which is directly related to range
queries, too).



On Wed, Jun 26, 2013 at 3:05 AM, Colin Blower cblo...@barracuda.com wrote:

  You could just separate the history data from the current data. Then
 when the user's result is updated, just write into two tables.

 CREATE TABLE all_answers (
   user_id uuid,
   created timeuuid,
   result text,
   question_id varint,
   PRIMARY KEY (user_id, created)
 )

 CREATE TABLE current_answers (
   user_id uuid,
   question_id varint,
   created timeuuid,
   result text,
   PRIMARY KEY (user_id, question_id)
 )


  select * FROM current_answers ;
  user_id  | question_id | result | created

 --+-++--
  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   1 | no |
 f9893ee0-ddfa-11e2-b74c-35d7be46b354
  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   2 |   blah |
 f7af75d0-ddfa-11e2-b74c-35d7be46b354

  select * FROM all_answers ;
  user_id  |
 created  | question_id | result

 --+--+-+
  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
 f0141234-ddfa-11e2-b74c-35d7be46b354 |   1 |yes
  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
 f7af75d0-ddfa-11e2-b74c-35d7be46b354 |   2 |   blah
  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
 f9893ee0-ddfa-11e2-b74c-35d7be46b354 |   1 | no

 This way you can get the history of answers if you want and there is a
 simple way to get the most current answers.

 Just a thought.
 -Colin B.



 On 06/24/2013 03:28 PM, Christopher J. Bottaro wrote:

 Yes, that makes sense and that article helped a lot, but I still have a
 few questions...

  The created_at in our answers table is basically used as a version id.
  When a user updates his answer, we don't overwrite the old answer, but
 rather insert a new answer with a more recent timestamp (the version).

  answers
 ---
 user_id | created_at | question_id | result
 ---
   1 | 2013-01-01 | 1   | yes
   1 | 2013-01-01 | 2   | blah
1 | 2013-01-02 | 1   | no

  So the queries we really want to run are find me all the answers for a
 given user at a given time.  So given the date of 2013-01-02 and user_id
 1, we would want rows 2 and 3 returned (since rows 3 obsoletes row 1).  Is
 it possible to do this with CQL given the current schema?

  As an aside, we can do this in Postgresql using window functions, not
 standard SQL, but pretty neat.

  We can alter our schema like so...

  answers
 ---
 user_id | start_at | end_at | question_id | result

  Where the start_at and end_at denote when an answer is active.  So the
 example above would become:

  answers
 ---
 user_id | start_at   | end_at | question_id | result
 
   1 | 2013-01-01 | 2013-01-02 | 1   | yes
   1 | 2013-01-01 | null   | 2   | blah
1 | 2013-01-02 | null   | 1   | no

  Now we can query SELECT * FROM answers WHERE user_id = 1 AND start_at
 = '2013-01-02' AND (end_at  '2013-01-02' OR end_at IS NULL).

  How would one define the partitioning key and cluster columns in CQL to
 accomplish this?  Is it as simple as PRIMARY KEY (user_id, start_at,
 end_at, question_id) (remembering that we sometimes want to limit by
 question_id)?

  Also, we are a bit worried about race conditions.  Consider two separate
 processes updating an answer for a given user_id / question_id.  There will
 be a race condition between the two to update the correct row's end_at
 field.  Does that make sense?  I can draw it out with ASCII tables, but I
 feel like this email is already too long... :P

  Thanks for the help.



 On Wed, Jun 19, 2013 at 2:28 PM, David McNelis dmcne...@gmail.com wrote:

 So, if you want to grab by the created_at and occasionally limit by
 question id, that is why you'd use created_at.

  The way the primary keys work is the first part of the primary key is
 the Partioner key, that field is what essentially is the single cassandra
 row.  The second key is the order preserving key, so you can sort by that
 key.  If you have a third piece, then that is the secondary order
 

Re: token() function in CQL3 (1.2.5)

2013-06-29 Thread Oleksandr Petrov
Tokens are very useful for pagination and world iteration. For example,
when you want to scan an entire table, you want to use token() function.

You can refer two guides we've written for Clojure driver (although they do
not contain much clojure-specific information.
First one is Data Modelling / Static Tables guide:
http://clojurecassandra.info/articles/data_modelling.html#toc_1
and second one would be K/V guide / Pagination:
http://clojurecassandra.info/articles/kv.html#toc_7


On Wed, Jun 19, 2013 at 5:06 PM, Tyler Hobbs ty...@datastax.com wrote:


 On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule ben_bo...@rapid7.com wrote:

  Can anyone explain this to me?  I have been looking through the source
 code but can't seem to find the answer.

 The documentation mentions using the token() function to change a value
 into it's token for use in queries.   It always mentions it as taking a
 single parameter:

 SELECT * FROM posts WHERE token(userid)  token('tom') AND token(userid)  
 token('bob')


 However on my 1.2.5 node I am getting the following error:

 e.x.

 create table foo (
 organization text,
 type text,
 time timestamp,
 id uuid,
 primary key ((organization, type, time), id))

 select * from foo where organization = 'companyA' and type = 'typeB' and
 token(time)  token('somevalue') and token(time)  token('othervalue')

 Bad Request: Invalid number of arguments in call to function token: 3
 required but 1 provided

 What are the other two parameters?  We don't currently use the token
 function but I was experimenting seeing if I could move the time into the
 partition key for a table like this to better distribute the rows.  But I
 can't seem to figure out how to get token() working.


 token() acts on the entire partition key, which for you is (organization,
 type, time), hence the 3 required values.

 In order to better distribute the rows, I suggest using a time bucket as
 part of the partition key.  For example, you might use only the date
 portion of the timestamp as the time bucket.

 These posts talk about doing something similar with the Thrift API, but
 they will probably still be helpful:
 - http://rubyscale.com/2011/basic-time-series-with-cassandra/
 - http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
alex p


Re: Data model for financial time series

2013-06-29 Thread Oleksandr Petrov
You can refer to the Data Modelling guide here:
http://clojurecassandra.info/articles/data_modelling.html
It includes several things you've mentioned (namely, range queries and
dynamic tables).

Also, it seems that it'd be useful for you to use indexes, and performing
filtering (for things related to give me everything about symbol X), for
that you can refer to K/V operations guide:
http://clojurecassandra.info/articles/kv.html#toc_8 (range query section)
and http://clojurecassandra.info/articles/kv.html#toc_10 (filtering
section).

It looks that data model fits really well to what Cassandra allows.
Especially in financial data, it sounds like you can use your symbol as a
partition key, which opens up a wild range of possibilities for querying.


On Fri, Jun 7, 2013 at 9:16 PM, Jake Luciani jak...@gmail.com wrote:

 We have built a similar system, you can ready about our data model in CQL3
 here:

 http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013

 We are going to be presenting a similar talk next week at the cassandra
 summit.


 On Fri, Jun 7, 2013 at 12:34 PM, Davide Anastasia 
 davide.anasta...@qualitycapital.com wrote:

  Hi,

 I am trying to build the storage of stock prices in Cassandra. My queries
 are ideally of three types:

 - give me everything between time A and time B;

 - give me everything about symbol X;

 - give me everything of type Y;

 …or an intersection of the three. Something I will be happy doing is:

 - give me all the trades about APPL between 7:00am and 3:00pm of a
 certain day.

 ** **

 However, being a time series, I will be happy to retrieve the data in
 ascending order of timestamp (from 7:00 to 3:00).

 ** **

 I have tried to build my table with the timestamp (as timeuuid) as
 primary key, however I cannot manage to get my data in order and and “order
 by” in CQL3 raise an error and doesn’t perform the query.

 ** **

 Does anybody have any suggestion to get a good design the fits my queries?
 

 Thanks,

 David




 --
 http://twitter.com/tjake




-- 
alex p


Re: Dynamic column family using CQL2, possible?

2013-06-29 Thread Oleksandr Petrov
WITH COMPACT STORAGE should allow accessing your dataset from CQL2,
actually.
There're newer driver that supports binary CQL, namely
https://github.com/iconara/cql-rb which is written by guys from Bart, who
know stuff about cassandra :)

We're using COMPACT STORAGE for tables we access through Thrift/Hadoop, and
it works perfectly well.
You can refer to Data Modelling guide if you want to learn more about how
to model your data to make it fit into Cassandra well:
http://clojurecassandra.info/articles/data_modelling.html


On Wed, May 29, 2013 at 12:44 AM, Matthew Hillsborough 
matthew.hillsboro...@gmail.com wrote:

 Hi all,

 I started building a schema using CQL3's interface following the
 instructions here: http://www.datastax.com/dev/blog/thrift-to-cql3

 In particular, the dynamic column family instructions did exactly what I
 need to model my data on that blog post.

 I created a schema that looks like the following:

 CREATE TABLE user_games (
   g_sp_key text,
   user_id int,
   nickname text,
   PRIMARY KEY (g_sp_key, user_id)
 ) WITH COMPACT STORAGE;

 Worked great. My problem is I tested everything in CQLsh. As soon as it
 came time to implementing in my application (a Ruby on Rails app using the
 cassandra-cql gem found at https://github.com/kreynolds/cassandra-cql), I
 realized cassandra-cql does not support CQL3 and I have to stick to CQL2.

 My question simply comes down to is it possible to do what I was
 attempting to do above in CQL2? How would my schema above change? Do I have
 to go back to using a Thrift based client?

 Thanks all.




-- 
alex p


Re: Thrift message length exceeded

2013-04-22 Thread Oleksandr Petrov
I've submitted a patch that fixes the issue for 1.2.3:
https://issues.apache.org/jira/browse/CASSANDRA-5504

Maybe guys know a better way to fix it, but that helped me in a meanwhile.


On Mon, Apr 22, 2013 at 1:44 AM, Oleksandr Petrov 
oleksandr.pet...@gmail.com wrote:

 If you're using Cassandra 1.2.3, and new Hadoop interface, that would make
 a call to next(), you'll have an eternal loop reading same things all over
 again from your cassandra nodes (you may see it if you enable Debug output).

 next() is clearing key() which is required for Wide Row iteration.

 Setting key back fixed issue for me.


 On Sat, Apr 20, 2013 at 3:05 PM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:

 Tried to isolate the issue in testing environment,

 What I currently have:

 That's a setup for test:
 CREATE KEYSPACE cascading_cassandra WITH replication = {'class' :
 'SimpleStrategy', 'replication_factor' : 1};
 USE cascading_cassandra;
 CREATE TABLE libraries (emitted_at timestamp, additional_info varchar,
 environment varchar, application varchar, type varchar, PRIMARY KEY
 (application, environment, type, emitted_at)) WITH COMPACT STORAGE;

 Next, insert some test data:

 (just for example)
 [INSERT INTO libraries (application, environment, type, additional_info,
 emitted_at) VALUES (?, ?, ?, ?, ?); [app env type 0 #inst 2013-04-20T13:01:
 04.935-00:00]]

 If keys (e.q. app env type) are all same across the dataset, it
 works correctly.
 As soon as I start varying keys, e.q. app1, app2, app3 or others, I
 get the error with Message Length Exceeded.

 Does anyone have some ideas?
 Thanks for help!


 On Sat, Apr 20, 2013 at 1:56 PM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:

 I can confirm running same problem.

 Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server
 side, reducing/increasing batch size.

 Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:

 Caused by: org.apache.thrift.protocol.TProtocolException: Message length
 exceeded: 8
 at
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)

 at
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:528)
  at
 org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
 at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
  at
 org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
  at
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
 at
 org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
  at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)


 On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple la...@spotright.comwrote:

 It's slow going finding the time to do so but I'm working on that.

 We do have another table that has one or sometimes two columns per row.
  We can run jobs on it without issue.  I looked through
 org.apache.cassandra.hadoop code and don't see anything that's really
 changed since 1.1.5 (which was also using thrift-0.7) so something of a
 puzzler about what's going on.


 On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com
 wrote:

  Can you reproduce this in a simple way ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
  That was our first thought.  Using maven's dependency tree info we
 verified that we're using the expected (cass 1.2.3) jars
 
  $ mvn dependency:tree | grep thrift
  [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
  [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
  I've also dumped the final command run by the hadoop we use (CDH3u5)
 and verified it's not sneaking thrift in on us.
 
 
  On Tue, Apr 16, 2013 at 4:36 PM, aaron morton 
 aa...@thelastpickle.com wrote:
  Can you confirm the you are using the same thrift version that ships
 1.2.3 ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com
 wrote:
 
  A bump to say I found this
 
 
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
  so others are seeing similar behavior.
 
  From what I can see of org.apache.cassandra.hadoop nothing has
 changed since 1.1.5 when we didn't see such things but sure looks like
 there's a bug that's slipped in (or been uncovered) somewhere.  I'll try to
 narrow down to a dataset and code that can reproduce.
 
  On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com
 wrote:
 
  We are using Astyanax

Re: Thrift message length exceeded

2013-04-21 Thread Oleksandr Petrov
If you're using Cassandra 1.2.3, and new Hadoop interface, that would make
a call to next(), you'll have an eternal loop reading same things all over
again from your cassandra nodes (you may see it if you enable Debug output).

next() is clearing key() which is required for Wide Row iteration.

Setting key back fixed issue for me.


On Sat, Apr 20, 2013 at 3:05 PM, Oleksandr Petrov 
oleksandr.pet...@gmail.com wrote:

 Tried to isolate the issue in testing environment,

 What I currently have:

 That's a setup for test:
 CREATE KEYSPACE cascading_cassandra WITH replication = {'class' :
 'SimpleStrategy', 'replication_factor' : 1};
 USE cascading_cassandra;
 CREATE TABLE libraries (emitted_at timestamp, additional_info varchar,
 environment varchar, application varchar, type varchar, PRIMARY KEY
 (application, environment, type, emitted_at)) WITH COMPACT STORAGE;

 Next, insert some test data:

 (just for example)
 [INSERT INTO libraries (application, environment, type, additional_info,
 emitted_at) VALUES (?, ?, ?, ?, ?); [app env type 0 #inst 2013-04-20T13:01:
 04.935-00:00]]

 If keys (e.q. app env type) are all same across the dataset, it
 works correctly.
 As soon as I start varying keys, e.q. app1, app2, app3 or others, I
 get the error with Message Length Exceeded.

 Does anyone have some ideas?
 Thanks for help!


 On Sat, Apr 20, 2013 at 1:56 PM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:

 I can confirm running same problem.

 Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server
 side, reducing/increasing batch size.

 Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:

 Caused by: org.apache.thrift.protocol.TProtocolException: Message length
 exceeded: 8
 at
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)

 at
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:528)
  at
 org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
 at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
  at
 org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
  at
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
 at
 org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
  at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)


 On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple la...@spotright.comwrote:

 It's slow going finding the time to do so but I'm working on that.

 We do have another table that has one or sometimes two columns per row.
  We can run jobs on it without issue.  I looked through
 org.apache.cassandra.hadoop code and don't see anything that's really
 changed since 1.1.5 (which was also using thrift-0.7) so something of a
 puzzler about what's going on.


 On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com
 wrote:

  Can you reproduce this in a simple way ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
  That was our first thought.  Using maven's dependency tree info we
 verified that we're using the expected (cass 1.2.3) jars
 
  $ mvn dependency:tree | grep thrift
  [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
  [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
  I've also dumped the final command run by the hadoop we use (CDH3u5)
 and verified it's not sneaking thrift in on us.
 
 
  On Tue, Apr 16, 2013 at 4:36 PM, aaron morton 
 aa...@thelastpickle.com wrote:
  Can you confirm the you are using the same thrift version that ships
 1.2.3 ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
  A bump to say I found this
 
 
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
  so others are seeing similar behavior.
 
  From what I can see of org.apache.cassandra.hadoop nothing has
 changed since 1.1.5 when we didn't see such things but sure looks like
 there's a bug that's slipped in (or been uncovered) somewhere.  I'll try to
 narrow down to a dataset and code that can reproduce.
 
  On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com
 wrote:
 
  We are using Astyanax in production but I cut back to just Hadoop
 and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
  We do have some extremely large rows but we went from everything
 working with 1.1.5 to almost everything carping with 1.2.3.  Something has
 changed

Re: Thrift message length exceeded

2013-04-20 Thread Oleksandr Petrov
I can confirm running same problem.

Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server
side, reducing/increasing batch size.

Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:

Caused by: org.apache.thrift.protocol.TProtocolException: Message length
exceeded: 8
at
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)

at
org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
at org.apache.cassandra.thrift.Column.read(Column.java:528)
at
org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
at
org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
at
org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)


On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple la...@spotright.com wrote:

 It's slow going finding the time to do so but I'm working on that.

 We do have another table that has one or sometimes two columns per row.
  We can run jobs on it without issue.  I looked through
 org.apache.cassandra.hadoop code and don't see anything that's really
 changed since 1.1.5 (which was also using thrift-0.7) so something of a
 puzzler about what's going on.


 On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com wrote:

  Can you reproduce this in a simple way ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
  That was our first thought.  Using maven's dependency tree info we
 verified that we're using the expected (cass 1.2.3) jars
 
  $ mvn dependency:tree | grep thrift
  [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
  [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
  I've also dumped the final command run by the hadoop we use (CDH3u5)
 and verified it's not sneaking thrift in on us.
 
 
  On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com
 wrote:
  Can you confirm the you are using the same thrift version that ships
 1.2.3 ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
  A bump to say I found this
 
 
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
  so others are seeing similar behavior.
 
  From what I can see of org.apache.cassandra.hadoop nothing has changed
 since 1.1.5 when we didn't see such things but sure looks like there's a
 bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow
 down to a dataset and code that can reproduce.
 
  On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:
 
  We are using Astyanax in production but I cut back to just Hadoop and
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
  We do have some extremely large rows but we went from everything
 working with 1.1.5 to almost everything carping with 1.2.3.  Something has
 changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed
 but surprises are never welcome in production.
 
  On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:
 
  I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector
 0.6 to 0.8
  Turns out the Thrift message really was too long.
  The mystery to me: Why no complaints in previous versions? Were some
 checks added in Thrift or Hector?
 
  -Original Message-
  From: Lanny Ripple [mailto:la...@spotright.com]
  Sent: Tuesday, April 09, 2013 6:17 PM
  To: user@cassandra.apache.org
  Subject: Thrift message length exceeded
 
  Hello,
 
  We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran
 sstableupgrades and got the ring on its feet and we are now seeing a new
 issue.
 
  When we run MapReduce jobs against practically any table we find the
 following errors:
 
  2013-04-09 09:58:47,746 INFO
 org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
  2013-04-09 09:58:47,899 INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with
 processName=MAP, sessionId=
  2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree:
 setsid exited with exit code 0
  2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using
 ResourceCalculatorPlugin :
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
  2013-04-09 09:58:50,475 INFO
 org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' 

Re: Thrift message length exceeded

2013-04-20 Thread Oleksandr Petrov
Tried to isolate the issue in testing environment,

What I currently have:

That's a setup for test:
CREATE KEYSPACE cascading_cassandra WITH replication = {'class' :
'SimpleStrategy', 'replication_factor' : 1};
USE cascading_cassandra;
CREATE TABLE libraries (emitted_at timestamp, additional_info varchar,
environment varchar, application varchar, type varchar, PRIMARY KEY
(application, environment, type, emitted_at)) WITH COMPACT STORAGE;

Next, insert some test data:

(just for example)
[INSERT INTO libraries (application, environment, type, additional_info,
emitted_at) VALUES (?, ?, ?, ?, ?); [app env type 0 #inst
2013-04-20T13:01:04.935-00:00]]

If keys (e.q. app env type) are all same across the dataset, it works
correctly.
As soon as I start varying keys, e.q. app1, app2, app3 or others, I
get the error with Message Length Exceeded.

Does anyone have some ideas?
Thanks for help!


On Sat, Apr 20, 2013 at 1:56 PM, Oleksandr Petrov 
oleksandr.pet...@gmail.com wrote:

 I can confirm running same problem.

 Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server
 side, reducing/increasing batch size.

 Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:

 Caused by: org.apache.thrift.protocol.TProtocolException: Message length
 exceeded: 8
 at
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)

 at
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:528)
  at
 org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
 at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
  at
 org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
  at
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
 at
 org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
  at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)


 On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple la...@spotright.comwrote:

 It's slow going finding the time to do so but I'm working on that.

 We do have another table that has one or sometimes two columns per row.
  We can run jobs on it without issue.  I looked through
 org.apache.cassandra.hadoop code and don't see anything that's really
 changed since 1.1.5 (which was also using thrift-0.7) so something of a
 puzzler about what's going on.


 On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com
 wrote:

  Can you reproduce this in a simple way ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
  That was our first thought.  Using maven's dependency tree info we
 verified that we're using the expected (cass 1.2.3) jars
 
  $ mvn dependency:tree | grep thrift
  [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
  [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
  I've also dumped the final command run by the hadoop we use (CDH3u5)
 and verified it's not sneaking thrift in on us.
 
 
  On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com
 wrote:
  Can you confirm the you are using the same thrift version that ships
 1.2.3 ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
  A bump to say I found this
 
 
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
  so others are seeing similar behavior.
 
  From what I can see of org.apache.cassandra.hadoop nothing has
 changed since 1.1.5 when we didn't see such things but sure looks like
 there's a bug that's slipped in (or been uncovered) somewhere.  I'll try to
 narrow down to a dataset and code that can reproduce.
 
  On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com
 wrote:
 
  We are using Astyanax in production but I cut back to just Hadoop
 and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
  We do have some extremely large rows but we went from everything
 working with 1.1.5 to almost everything carping with 1.2.3.  Something has
 changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed
 but surprises are never welcome in production.
 
  On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:
 
  I also saw this when upgrading from C* 1.0 to 1.2.2, and from
 hector 0.6 to 0.8
  Turns out the Thrift message really was too long.
  The mystery to me: Why no complaints in previous versions? Were
 some checks added in Thrift or Hector?
 
  -Original Message

Using mapvarchar, varchar type with composite primary key causes significant performance decrease

2013-04-18 Thread Oleksandr Petrov
Hi,

I'm trying to persist some event data, I've tried to identify the
bottleneck, and it seems to work like that:

If I create a table with primary key based on (application, environment,
type and emitted_at):

CREATE TABLE events (application varchar, environment varchar, type
varchar, additional_info mapvarchar, varchar, hostname varchar,
emitted_at timestamp,
*PRIMARY KEY (application, environment, type, emitted_at));*

And insert events via CQL, prepared statements:

INSERT INTO events (environment, application, hostname, emitted_at, type,
additional_info) VALUES (?, ?, ?, ?, ?, ?);

Values are: local analytics noname #inst
2013-04-18T16:37:02.723-00:00 event_type {some value}

After about 1-2K inserts I get significant performance decrease.

I've tried using only emitted_at (timestamp) as a primary key, OR writing
additional_info data as a serialized JSON (varchar) instead of Map. Both
scenarios seem to solve the performance degradation.

I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core
machine with 2GB Ram.

What could I do wrong here? What may cause performance issues?..
Thank you


-- 
alex p


Re: Using mapvarchar, varchar type with composite primary key causes significant performance decrease

2013-04-18 Thread Oleksandr Petrov
Write performance decreases.

Reads are basically blocked, too. Sometimes I have to wait 3-4 seconds to
get a count even though there're only couple of thousand small entries in a
table.


On Thu, Apr 18, 2013 at 8:37 PM, aaron morton aa...@thelastpickle.comwrote:

 After about 1-2K inserts I get significant performance decrease.

 A decrease in performance doing what ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/04/2013, at 4:43 AM, Oleksandr Petrov oleksandr.pet...@gmail.com
 wrote:

 Hi,

 I'm trying to persist some event data, I've tried to identify the
 bottleneck, and it seems to work like that:

 If I create a table with primary key based on (application, environment,
 type and emitted_at):

 CREATE TABLE events (application varchar, environment varchar, type
 varchar, additional_info mapvarchar, varchar, hostname varchar,
 emitted_at timestamp,
 *PRIMARY KEY (application, environment, type, emitted_at));*

 And insert events via CQL, prepared statements:

 INSERT INTO events (environment, application, hostname, emitted_at, type,
 additional_info) VALUES (?, ?, ?, ?, ?, ?);

 Values are: local analytics noname #inst 2013-04-18T16:37:02.723-00:00
 event_type {some value}

 After about 1-2K inserts I get significant performance decrease.

 I've tried using only emitted_at (timestamp) as a primary key, OR writing
 additional_info data as a serialized JSON (varchar) instead of Map. Both
 scenarios seem to solve the performance degradation.

 I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core
 machine with 2GB Ram.

 What could I do wrong here? What may cause performance issues?..
 Thank you


 --
 alex p





-- 
alex p


Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-01-30 Thread Oleksandr Petrov
Yes, execute_cql3_query, exactly.


On Wed, Jan 30, 2013 at 4:37 PM, Michael Kjellman
mkjell...@barracuda.comwrote:

 Are you using execute_cql3_query() ?

 On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov 
 oleksandr.pet...@gmail.com wrote:

  Hi,
 
  I'm creating a table via cql3 query like:
 
  CREATE TABLE posts (
userid text,
blog_name text,
entry_title text,
posted_at text,
PRIMARY KEY (userid, blog_name)
  )
 
  After that i'm trying to insert into same column family via thrift
 interface, and i'm getting following exception: Not enough bytes to read
 value of component 0
 
   Cassandra.java:20833
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read
 TServiceClient.java:78
 org.apache.thrift.TServiceClient.receiveBase
 Cassandra.java:964
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate
 Cassandra.java:950
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate
 
  Thrift client doesn't even display that column family when running
 describe_keyspace.
 
 
  I may be missing something, and I realize that CQL3 is way to, but i'm
 still oblivious of wether it's even possible to combine cql3 and thrift
 things.
 
  --
  alex p




-- 
alex p


Re: CQL 2, CQL 3 and Thrift confusion

2012-09-24 Thread Oleksandr Petrov
Yup, that was exactly the cause. Somehow I could not figure out why it was
downcasing my keyspace name all the time.
May be good to put it somewhere in reference material with a more detailed
explanation.

On Sun, Sep 23, 2012 at 9:30 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 In CQL3, names are case insensitive by default, while they were case
 sensitive in CQL2. You can force whatever case you want in CQL3
 however using double quotes. So in other words, in CQL3,
   USE TestKeyspace;
 should work as expected.

 --
 Sylvain

 On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov
 oleksandr.pet...@gmail.com wrote:
  Hi,
 
  I'm currently using Cassandra 1.1.5.
 
  When I'm trying to create a Keyspace from CQL 2 with a command (`cqlsh
 -2`):
 
CREATE KEYSPACE TestKeyspace WITH strategy_class = 'SimpleStrategy' AND
  strategy_options:replication_factor = 1
 
  Then try to access it from CQL 3 (`cqlsh -3`):
 
USE TestKeyspace;
 
  I get an error: Bad Request: Keyspace 'testkeyspace' does not exist
 
  Same thing is applicable to Thrift Interface. Somehow, I can only access
  keyspaces created from CQL 2 via Thrift Interface.
 
  Basically, I get same exact error: InvalidRequestException(why:There is
 no
  ring for the keyspace: CascadingCassandraCql3)
 
  Am I missing some switch? Or maybe it is intended to work that way?...
  Thanks!
 
  --
  alex p




-- 
alex p


Re: Cassandra Counters

2012-09-24 Thread Oleksandr Petrov
Maybe I'm missing the point, but counting in a standard column family would
be a little overkill.

I assume that distributed counting here was more of a map/reduce
approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
lot. We're doing some more complex counting (e.q. based on sets of rules)
like that. Of course, that would perform _way_ slower than counting
beforehand. On the other side, you will always have a consistent result for
a consistent dataset.

On the other hand, if you use things like AMQP or Storm (sorry to put up my
sentence together like that, as tools are mostly either orthogonal or
complementary, but I hope you get my point), you could build a topology
that makes fault-tolerant writes independently of your original write. Of
course, it would still have a consistency tradeoff, mostly because of race
conditions and different network latencies etc.

So I would say that building a data model in a distributed system often
depends more on your problem than on the common patterns, because
everything has a tradeoff.

Want to have an immediate result? Modify your counter while writing the row.
Can sacrifice speed, but have more counting opportunities? Go with offline
distributed counting.
Want to have kind of both, dispatch a message and react upon it, having the
processing logic and writes decoupled from main application, allowing you
to care less about speed.

However, I may have missed the point somewhere (early morning, you know),
so I may be wrong in any given statement.
Cheers


On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or other variation of distributed
 counting) in case of having determined that a centralized version of
 counting is not going to work.
 You'd determine the non_feasibility of centralized counting by figuring
 the speed at which you need to sustain writes and reads and reconcile that
 with your hard disk seek times (essentially).
 Once you have proved that you can't do centralized counting, the second
 layer of arsenal comes into play; which is distributed counting.
 In distributed counting , the CAP theorem comes into life.  in Cassandra,
 Availability and Network Partitioning trumps over Consistency.

 So yes, you sacrifice strong consistency for availability and partion
 tolerance; for eventual consistency.
 On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
 wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state my queries pointwise.

 a) what are the performance tradeoffs on reads  writes between creating a
 standard column family and manually doing the counts by a lookup on a key,
 versus using counters.

 b) whats the current state of counters limitations in the latest version
 of apache cassandra?

 c) with there being a possibilty of counter values getting out of sync,
 would counters not be recommended where strong consistency is desired. The
 normal benefits of cassandra's tunable consistency would not be applicable,
 as re-tries may cause overstating. So the normal use case is high
 performance, and where consistency is not paramount.

 Regards,
 roshni



 --
 From: roshni_rajago...@hotmail.com
 To: user@cassandra.apache.org
 Subject: Cassandra Counters
 Date: Mon, 24 Sep 2012 16:21:55 +0530

  Hi ,

 I'm trying to understand if counters are a good fit for my use case.
 Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many
 times over now...
 and still need help!

 Suppose I have a list of items- to which I can add or delete a set of
 items at a time,  and I want a count of the items, without considering
 changing the database  or additional components like zookeeper,
 I have 2 options_ the first is a counter col family, and the second is a
 standard one
   1. List_Counter_CFTotalItemsListId 502.List_Std_CF

 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5  ListId 3 70 -20 3
 -6

 And in the second I can add a new col with every set of items added or
 deleted. Over time this row may grow wide.
 To display the final count, Id need to read the row, slice through all
 columns and add them.

 In both cases the writes should be fast, in fact standard col family
 should be faster as there's no read, before write. And for CL ONE write the
 latency should be same.
 For reads, the first option is very good, just read one column for a key

 For the second, the read involves reading the row, and adding each column
 value via application code. I