Re: Cassandra 5.0 Beta1 - vector searching results

2024-03-21 Thread Jonathan Ellis
 Memtable off heap memory used: 0
>  Memtable switch count: 16
>  Speculative retries: 0
>  Local read count: 0
>  Local read latency: NaN ms
>  Local write count: 2893108
>  Local write latency: NaN ms
>  Local read/write ratio: 0.0
>  Pending flushes: 0
>  Percent repaired: 100.0
>  Bytes repaired: 9.066GiB
>  Bytes unrepaired: 0B
>  Bytes pending repair: 0B
>  Bloom filter false positives: 7245
>  Bloom filter false ratio: 0.00286
>  Bloom filter space used: 87264
>  Bloom filter off heap memory used: 87216
>  Index summary off heap memory used: 34624
>  Compression metadata off heap memory used: 4753072
>  Compacted partition minimum bytes: 2760
>  Compacted partition maximum bytes: 4866323
>  Compacted partition mean bytes: 154523
>  Average live cells per slice (last five minutes): NaN
>  Maximum live cells per slice (last five minutes): 0
>  Average tombstones per slice (last five minutes): NaN
>  Maximum tombstones per slice (last five minutes): 0
>  Droppable tombstone ratio: 0.0
>
> nodetool tablehistograms doc.embeddings_googleflant5large
>
> doc/embeddings_googleflant5large histograms
> Percentile  Read Latency Write Latency  SSTables
> Partition SizeCell Count
>  (micros) (micros) (bytes)
> 50% 0.00  0.00 0.00
> 105778   124
> 75% 0.00  0.00 0.00
> 182785   215
> 95% 0.00  0.00 0.00
> 379022   446
> 98% 0.00  0.00 0.00
> 545791   642
> 99% 0.00  0.00 0.00
> 654949   924
> Min 0.00  0.00 0.00
> 2760 4
> Max 0.00  0.00 0.00
> 4866323  5722
>
> Running a query such as:
>
> select uuid,offset,type,textdata from doc.embeddings_googleflant5large
> order by embeddings ANN OF [768 dimension vector] limit 20;
>
> Works fine - typically less than 5 seconds to return.  Subsequent
> queries are even faster.  If I'm activity adding data to the table, the
> searches can sometimes timeout (using cqlsh).
> If I add something to the where clause, the performance drops
> significantly:
>
> select uuid,offset,type,textdata from doc.embeddings_googleflant5large
> where offset=1 order by embeddings ANN OF [] limit 20;
>
> That query will timeout when running in cqlsh and with no data being
> added to the table.
> We've been running a Weaviate database side-by-side with Cassandra 4,
> and would love to drop Weaviate if we can do all the vector searches
> inside of Cassandra.
> What else can I try?  Anything to increase performance?
> Thanks all!
>
> -Joe
>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: DataStax Accelerate CFP

2020-01-07 Thread Jonathan Ellis
Happy new year, everyone!

Reminder, the Accelerate CFP closes in just over two weeks:
https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open

On Mon, Nov 11, 2019 at 1:28 PM Jonathan Ellis  wrote:

> This spring DataStax kicked off Accelerate, a new conference carrying on
> the spirit and tradition of the seven Cassandra Summits that we sponsored
> and organized in the past.  We had a great set of talks (check out
> ungated, full session videos here
> <https://www.youtube.com/playlist?list=PLm-EPIkBI3YpJbuKUGDlZVNHzT0umcBSl>)
> and even more great conversations with an attendance of about a thousand.
>
> Next year's Accelerate <https://www.datastax.com/accelerate> will be May
> 11-13 in San Diego, and the call for papers is now open!  We'd love to hear
> about your successes, custom extensions, and lessons learned with Apache
> Cassandra or DataStax.  The direct link to submitting a proposal is here
> <https://sessionize.com/datastax-accelerate-san-diego/>, and more
> background with suggestions for first-time speakers is here
> <https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open>
> .
>
> The CFP closes Jan 22.  Hope to hear from you soon!
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


[Announce] DataStax Support for Apache Cassandra, New Tools

2019-12-17 Thread Jonathan Ellis
*Hi all,*

* Today DataStax is pleased to announce Luna
<https://www.datastax.com/services/datastax-luna>: support for Apache
Cassandra versions 2.1, 2.2, 3.0, and 3.11. The short version is that with
Luna, we’re making our expertise available to Apache Cassandra users as a
subscription-based support plan with public pricing that you can buy
directly through our website. The full announcement is here
<https://www.datastax.com/press-release/introducing-datastax-luna-enterprise-support-apache-cassandra>.
Additionally,
as part of our ongoing commitment to Cassandra, we’re also announcing the
availability of DataStax Bulk Loader
<https://downloads.datastax.com/#bulk-loader> and DataStax Apache Kafka
Connector <https://downloads.datastax.com/#akc> as free downloads, making
loading and unloading data from Cassandra faster and easier.  Details of
this release are here
<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. *

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


DataStax Accelerate CFP

2019-11-11 Thread Jonathan Ellis
This spring DataStax kicked off Accelerate, a new conference carrying on
the spirit and tradition of the seven Cassandra Summits that we sponsored
and organized in the past.  We had a great set of talks (check out ungated,
full session videos here
<https://www.youtube.com/playlist?list=PLm-EPIkBI3YpJbuKUGDlZVNHzT0umcBSl>)
and even more great conversations with an attendance of about a thousand.

Next year's Accelerate <https://www.datastax.com/accelerate> will be May
11-13 in San Diego, and the call for papers is now open!  We'd love to hear
about your successes, custom extensions, and lessons learned with Apache
Cassandra or DataStax.  The direct link to submitting a proposal is here
<https://sessionize.com/datastax-accelerate-san-diego/>, and more
background with suggestions for first-time speakers is here
<https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open>
.

The CFP closes Jan 22.  Hope to hear from you soon!

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread Jonathan Ellis
Which was followed up by
https://www.researchgate.net/profile/Akon_Dey/publication/282156834_Scalable_Distributed_Transactions_across_Heterogeneous_Stores/links/56058b9608ae5e8e3f32b98d.pdf

On Tue, Oct 16, 2018 at 1:02 PM Jonathan Ellis  wrote:

> It looks like it's based on this:
> http://www.vldb.org/pvldb/vol6/p1434-dey.pdf
>
> On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> Yes this does sound great. Does this rely on Cassandra's internal SERIAL
>> consistency and CAS functionality or is that implemented at a higher level?
>>
>> Regards,
>> Ariel
>>
>> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
>> > This is great!
>> >
>> > --
>> > Jeff Jirsa
>> >
>> >
>> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada 
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > # Sorry, I accidentally emailed the following to dev@, so re-sending
>> to here.
>> > >
>> > > We have been working on ACID-compliant transaction library on top of
>> > > Cassandra called Scalar DB,
>> > > and are pleased to announce the release of v.1.0 RC version in open
>> source.
>> > >
>> > > https://github.com/scalar-labs/scalardb/
>> > >
>> > > Scalar DB is a library that provides a distributed storage abstraction
>> > > and client-coordinated distributed transaction on the storage,
>> > > and makes non-ACID distributed database/storage ACID-compliant.
>> > > And Cassandra is the first supported database implementation.
>> > >
>> > > It's been internally tested intensively and is jepsen-passed.
>> > > (see jepsen directory for more detail)
>> > > If you are looking for ACID transaction capability on top of
>> cassandra,
>> > > Please take a look and give us a feedback or contribution.
>> > >
>> > > Best regards,
>> > > Hiroyuki Yamada
>> > >
>> > > -
>> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> > > For additional commands, e-mail: user-h...@cassandra.apache.org
>> > >
>> >
>> > -----
>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread Jonathan Ellis
It looks like it's based on this:
http://www.vldb.org/pvldb/vol6/p1434-dey.pdf

On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg  wrote:

> Hi,
>
> Yes this does sound great. Does this rely on Cassandra's internal SERIAL
> consistency and CAS functionality or is that implemented at a higher level?
>
> Regards,
> Ariel
>
> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
> > This is great!
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada 
> wrote:
> > >
> > > Hi all,
> > >
> > > # Sorry, I accidentally emailed the following to dev@, so re-sending
> to here.
> > >
> > > We have been working on ACID-compliant transaction library on top of
> > > Cassandra called Scalar DB,
> > > and are pleased to announce the release of v.1.0 RC version in open
> source.
> > >
> > > https://github.com/scalar-labs/scalardb/
> > >
> > > Scalar DB is a library that provides a distributed storage abstraction
> > > and client-coordinated distributed transaction on the storage,
> > > and makes non-ACID distributed database/storage ACID-compliant.
> > > And Cassandra is the first supported database implementation.
> > >
> > > It's been internally tested intensively and is jepsen-passed.
> > > (see jepsen directory for more detail)
> > > If you are looking for ACID transaction capability on top of cassandra,
> > > Please take a look and give us a feedback or contribution.
> > >
> > > Best regards,
> > > Hiroyuki Yamada
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Reminder: don't listen on public addresses

2017-01-20 Thread Jonathan Ellis
MongoDB has been in the news for hackers deleting unsecured databases and
demanding money to return the data.

Now copycats are starting to look at other targets too like the thousands
of unsecured Cassandra databases.

Preventing this is very simple: don't allow Cassandra to listen on public
interfaces.

Of course additional security measures are useful as defense in depth, but
bottom line if the bad guys can't connect to your cluster they can't harm
it.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Disabling all caching in Cassandra

2016-06-22 Thread Jonathan Ellis
[Moving to users list]

The most important thing will be to reduce your JVM heap size.  Cassandra
will automatically reduce pool sizes as you do that.  Disabling key cache
and row cache will help you get that even smaller.

On Tue, Jun 21, 2016 at 5:21 AM, Sumit Anvekar <sumit.anve...@gmail.com>
wrote:

> Hello,
> We are using Cassandra 3.0.7 version and off late we see that 90% of
> memory is occupied even though hard-drive is hardly used. We have a cluster
> of 5 nodes with 15 GB memory, 4 cores, 200 GB SSD.
>
> We tried all kind of configurations through both YAML as well as table
> based properties but none seem to help. Memory usage constantly increases
> almost in direct ratio of data.
>
> What we are trying to do is, utilize as less memory as possible and we are
> okay with reduced read performance. Application is write intensive. To do
> this, our idea was to disable all caches possible, to avoid keeping
> anything not-necessary in memory.
>
> ​Find attached our yaml and table configuration.
>
> CREATE KEYSPACE if not exists test_ks WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'};
> CREATE TABLE if not exists test_ks.test_cf (id bigint PRIMARY KEY,key_str
> text,value1 int,value2 int,update_ts bigint) WITH bloom_filter_fp_chance =
> 1 AND comment = '' AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'} AND compression =
> {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance =
> 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND
> gc_grace_seconds = 864000 AND max_index_interval = 10240 AND
> memtable_flush_period_in_ms = 360 AND min_index_interval = 10240 AND
> read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE' AND caching
> = {'keys': 'NONE', 'rows_per_partition': 'NONE'};
>
> Has anyone tried such configuration before? Please let us know if
> disabling cache would help us in this situation. And if yes, how we can
> disable cache completely.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Announcement: Thrift API deprecation

2016-01-04 Thread Jonathan Ellis
Thrift has been officially frozen for almost two years and unofficially for
longer.   Meanwhile, maintaining Thrift support through changes like 8099
has been a substantial investment.

Thus, we are officially deprecating Thrift now and removing support in 4.0,
i.e. Nov 2016 if tick-tock goes as planned.

(I note that some users have been unable to completely migrate away from
Thrift because CQL doesn’t quite provide feature parity.  The last such
outstanding issue is mixing static and dynamic Thrift “columns” in a single
table.  We have an issue open to address this [1] and should have it
committed for 3.4.  In the meantime, I thought it best to give people more
notice rather than less.)

[1] https://issues.apache.org/jira/browse/CASSANDRA-10857

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra users survey

2015-10-20 Thread Jonathan Ellis
Thanks for all the responses!

The results (minus suggestions and emails) are available here:
https://docs.google.com/spreadsheets/d/1FegCArZgj2DNAjNkcXi1n2Y1Kfvf6cdZedkMPYQdvC0/edit?usp=sharing

I've included charts on separate sheets for each question, but
unfortunately I couldn't figure out how to help Google make sense of any of
the data where the form allowed multiple or free-form responses.

Some things that jump out at me:

- 3/4 of responses use only CQL.
- 3% have more than 1000 tables in the schema. On an absolute scale this is
low but still more than I expected.
- 60% are deployed across more than one datacenter
- I should have broken down the node count responses into more detail;
roughly 50% each in 1-10 and 10-100.  I should also include an "are you in
production?" question next time.
- More responses of both "less than 32 GB ram/node" and "128 GB or more"
than I expected.
- Including the "both" responses, a majority of users are deploying SSD now.

On Wed, Sep 30, 2015 at 1:18 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> With 3.0 approaching, the Apache Cassandra team would appreciate your
> feedback as we work on the project roadmap for future releases.
>
> I've put together a brief survey here:
> https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form
>
> Please take a few minutes to fill it out!
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra users survey

2015-10-07 Thread Jonathan Ellis
I think what would be most useful would be to pick your largest cluster,
and answer based on that.  If you have multiple applications in the
cluster, then the sum; otherwise, just one.

On Thu, Oct 1, 2015 at 9:50 PM, Jim Ancona <j...@anconafamily.com> wrote:

> Hi Jonathan,
>
> The survey asks about "your application." We have multiple applications
> using Cassandra. Are you looking for information about each application
> separately, or the sum of all of them?
>
> Jim
>
> On Wed, Sep 30, 2015 at 2:18 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
>> With 3.0 approaching, the Apache Cassandra team would appreciate your
>> feedback as we work on the project roadmap for future releases.
>>
>> I've put together a brief survey here:
>> https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form
>>
>> Please take a few minutes to fill it out!
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>
>>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Cassandra users survey

2015-09-30 Thread Jonathan Ellis
With 3.0 approaching, the Apache Cassandra team would appreciate your
feedback as we work on the project roadmap for future releases.

I've put together a brief survey here:
https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form

Please take a few minutes to fill it out!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Jonathan Ellis
3.1 is EOL as soon as 3.3 (the next bug fix release) comes out.

On Thu, Jun 11, 2015 at 4:10 AM, Stefan Podkowinski 
stefan.podkowin...@1und1.de wrote:

  We are also extending our backwards compatibility policy to cover all
 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for
 instance, including cross-version repair.

 What will be the EOL policy for releases after 3.0? Given your example,
 will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7?




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Jonathan Ellis
As soon as 8099 is done.

On Thu, Jun 11, 2015 at 11:53 AM, Pierre Devops pierredev...@gmail.com
wrote:

 Hi,

 3.x beta release date ?

 2015-06-11 16:21 GMT+02:00 Jonathan Ellis jbel...@gmail.com:

 3.1 is EOL as soon as 3.3 (the next bug fix release) comes out.

 On Thu, Jun 11, 2015 at 4:10 AM, Stefan Podkowinski 
 stefan.podkowin...@1und1.de wrote:

  We are also extending our backwards compatibility policy to cover all
 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for
 instance, including cross-version repair.

 What will be the EOL policy for releases after 3.0? Given your example,
 will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7?




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread Jonathan Ellis
We've started using the docs-impacting label
https://issues.apache.org/jira/issues/?jql=labels%20%3D%20docs-impacting%20AND%20project%20%3D%20CASSANDRA
to make it easier for the technical writers to keep up, but otherwise we're
not planning any major changes.

On Thu, Jun 11, 2015 at 4:50 AM, Daniel Compton 
daniel.compton.li...@gmail.com wrote:

 Hi Jonathan

 Does documentation fit into the new monthly releases and definition of
 done as well, or is that part of another process? I didn't see any mention
 of it in the docs, though I may have missed it.

 On Thu, 11 Jun 2015 at 9:10 pm Stefan Podkowinski 
 stefan.podkowin...@1und1.de wrote:

  We are also extending our backwards compatibility policy to cover all
 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for
 instance, including cross-version repair.

 What will be the EOL policy for releases after 3.0? Given your example,
 will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7?

 --
 --
 Daniel




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Cassandra 2.2, 3.0, and beyond

2015-06-10 Thread Jonathan Ellis
*As you know, we've split our post-2.1 release into two pieces, with 2.2 to
be released in July (rc1 out Monday
http://cassandra.apache.org/download/) and 3.0 in September.2.2 will
include Windows support, commitlog compression
https://issues.apache.org/jira/browse/CASSANDRA-6809, JSON support
https://issues.apache.org/jira/browse/CASSANDRA-7970, role-based
authorization
http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra,
bootstrap-aware leveled compaction
https://issues.apache.org/jira/browse/CASSANDRA-7460, and user-defined
functions
http://christopher-batey.blogspot.com/2015/05/cassandra-aggregates-min-max-avg-group.html.
3.0 will include a major storage engine rewrite
https://issues.apache.org/jira/browse/CASSANDRA-8099 and materialized
views https://issues.apache.org/jira/browse/CASSANDRA-6477.We're
splitting things up this way because we don't want to block the features
that are already complete while waiting for 8099 (the new storage engine).
Releasing them now as 2.2 reduces the risk for users (2.2 has a lot in
common with 2.1) and allows us to stabilize that independently of the
upheaval from 8099.After 3.0, we'll take this even further: we will release
3.x versions monthly.  Even releases will include both bugfixes and new
features; odd releases will be bugfix-only.  You may have heard this
referred to as tick-tock releases, after Intel's policy of changing
process and architecture independently
http://www.intel.com/content/www/us/en/silicon-innovations/intel-tick-tock-model-general.html.The
primary goal is to improve release quality.  Our current major dot zero
releases require another five or six months to make them stable enough for
production.  This is directly related to how we pile features in for 9 to
12 months and release all at once.  The interactions between the new
features are complex and not always obvious.  2.1 was no exception, despite
DataStax hiring a full time test engineering team specifically for Apache
Cassandra.We need to try something different.  Tick-tock releases will
dramatically reduce the number of features in each version, which will
necessarily improve our ability to quickly track down any regressions.  And
pausing every other month to focus on bug fixes will help ensure that we
don't accumulate issues faster than we can fix them.Tick-tock will also
prevent situations like the one we are in now with 8099 delaying everything
else.  Users will get to test new features almost immediately.To get there,
we are investing significant effort in making trunk always releasable,
with the goal that each release, or at least each odd-numbered bugfix
release, should be usable in production.  We’ve extended our continuous
integration server to make it easy for contributors to run tests against
feature branches
http://www.datastax.com/dev/blog/cassandra-testing-improvements-for-developer-convenience-and-confidence
before merging to trunk and we’re working on more test infrastructure
https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0
and procedures
https://docs.google.com/document/d/1ptr47UQ56N80jqL_O6AlE67b0STyn_cVp2k5DTv-OMc
to improve release quality.  You can see how this is coming along in our
May retrospective
https://docs.google.com/document/d/1GtuYRocdr9luNdwmm8wE84uC5Wr6TvewFbQtqoAFVeU/edit.We
are also extending our backwards compatibility policy to cover all 3.x
releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for
instance, including cross-version repair.  We will not introduce any extra
upgrade requirements or remove deprecated features until 4.0, no sooner
than a year after 3.0.Under normal conditions, we will not release 3.x.y
stability releases for x  0.  That is, we will have a traditional 3.0.y
stability series, but the odd-numbered bugfix-only releases will fill that
role for the tick-tock series -- recognizing that occasionally we will need
to be flexible enough to release an emergency fix in the case of a critical
bug or security vulnerability.We do recognize that it will take some time
for tick-tock releases to deliver production-level stability, which is why
we will continue to deliver 2.2.y and 3.0.y bugfix releases.  (But if we do
demonstrate that tick-tock can deliver the stability we want, there will be
no need for a 4.0.y bugfix series, only 4.x tick-tock.) After 2.2.0 is
released, 2.0 will reach end-of-life as planned.  After 3.0.0 is released,
2.1 will also reach end of life.  This is earlier than expected, but 2.2
will be very close to as stable as 2.1 and users will be well served by
upgrading.  We will maintain the 2.2 stability series until 4.0 is
released, and 3.0 for six months after that.Thanks for reading this far,
and I look forward to hearing how 2.2rc1 works for you!*
-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


HSHA Thrift server corruption in Cassandra 2.0.0 - 2.0.5

2014-03-08 Thread Jonathan Ellis
The hsha (half-synchronous, half-asynchronous) Thrift server was
rewritten on top of Disruptor for Cassandra 2.0 [1] to unlock
substantial performance benefits over the old hsha.  Unfortunately,
the rewrite introduced a bug that can cause incorrect data to be sent
from the coordinator to replicas.  I apologize that it took so long
for us to realize what was causing the compaction errors reported as
far back as November.

Who is affected: anyone running the hsha server in a 2.0.x release for x  6.

Who is NOT affected: anyone using the native protocol or the default
sync Thrift server.

2.0.6 has a fix and is expected to be released Monday; you can grab
the pre-release build from [3], or apply the patch from [4] yourself.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5582
[2] https://issues.apache.org/jira/browse/CASSANDRA-6285
[3] http://people.apache.org/~slebresne/
[4] 
https://issues.apache.org/jira/secure/attachment/12632583/CASSANDRA-6285-disruptor-heap.patch

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: GCInspector GC for ConcurrentMarkSweep running every 15 seconds

2014-02-18 Thread Jonathan Ellis
Sounds like you have CMSInitiatingOccupancyFraction set close to 60.
You can raise that and/or figure out how to use less heap.

On Mon, Feb 17, 2014 at 5:06 PM, John Pyeatt john.pye...@singlewire.com wrote:
 I have a 6 node cluster running on AWS. We are using m1.large instances with
 heap size set to 3G.

 5 of the 6 nodes seem quite healthy. The 6th one however is running
 GCInspector GC for ConcurrentMarkSweep every 15 seconds or so. There is
 nothing going on on this box. No repairs and almost not user activity. But
 the CPU is almost continuously at 50% or more.

 The only message in the log at all is the
  INFO 2014-02-17 22:58:53,429 [ScheduledTasks:1] GCInspector GC for
 ConcurrentMarkSweep: 213 ms for 1 collections, 1964940024 used; max is
 3200253952
  INFO 2014-02-17 22:59:07,431 [ScheduledTasks:1] GCInspector GC for
 ConcurrentMarkSweep: 250 ms for 1 collections, 1983269488 used; max is
 3200253952
  INFO 2014-02-17 22:59:21,522 [ScheduledTasks:1] GCInspector GC for
 ConcurrentMarkSweep: 280 ms for 1 collections, 1998214480 used; max is
 3200253952
  INFO 2014-02-17 22:59:36,527 [ScheduledTasks:1] GCInspector GC for
 ConcurrentMarkSweep: 305 ms for 1 collections, 2013065592 used; max is
 3200253952
  INFO 2014-02-17 22:59:50,529 [ScheduledTasks:1] GCInspector GC for
 ConcurrentMarkSweep: 334 ms for 1 collections, 2028069232 used; max is
 3200253952

 We don't see any of these messages on the other nodes in the cluster.

 We are seeing similar behaviour for both our production and QA clusters.
 Production is running cassandra 1.2.9 and QA is running 1.2.13.

 Here are some of the cassandra settings that I would think might be
 relevant.

 flush_largest_memtables_at: 0.75
 reduce_cache_sizes_at: 0.85
 reduce_cache_capacity_to: 0.6
 in_memory_compaction_limit_in_mb: 64

 Does anyone have any ideas why we are seeing this so selectively on one box?

 Any cures???
 --
 John Pyeatt
 Singlewire Software, LLC
 www.singlewire.com
 --
 608.661.1184
 john.pye...@singlewire.com



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Jonathan Ellis
Nice work, Ed.  Personally, I do find it more productive to write
system tests in Python (dtest builds on ccm to provide a number of
utilities that cut down on the bolierplate [1]), but I can understand
that others will feel differently and more testing can only improve
Cassandra.

Thanks!

[1] https://github.com/riptano/cassandra-dtest

On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 The repo:
 https://github.com/edwardcapriolo/farsandra

 The code:
Farsandra fs = new Farsandra();
 fs.withVersion(2.0.4);
 fs.withCleanInstanceOnStart(true);
 fs.withInstanceName(1);
 fs.withCreateConfigurationFiles(true);
 fs.withHost(localhost);
 fs.withSeeds(Arrays.asList(localhost));
 fs.start();

 The story:
 For a while I have been developing applications that use Apache Cassandra as
 their data store. Personally I am more of an end-to-end test person then a
 mock test person. For years I have relied heavily on Hector's embedded
 cassandra to bring up Cassandra in a sane way inside a java project.

 The concept of Farsandra is to keep Cassandra close (in end to end tests and
 not mocked away) but keep your classpath closer (running cassandra embedded
 should be seamless and not mess with your client classpath).

 Recently there has been much fragmentation with Hector Asytanax, CQL, and
 multiple Cassandra releases. Bringing up an embedded test is much harder
 then it need be.

 Cassandra's core methods get, put, slice over thrift have been
 wire-compatible from version 0.7 - current. However Java libraries for
 thrift and things like guava differ across the Cassandra versions. This
 makes a large number of issues when trying to use your favourite client
 with your 1 or more versions of Cassandra. (sometimes a thrift mismatch
 kills the entire integration and you (CANT)! test anything.

 Farsandra is much like https://github.com/pcmanus/ccm in that it launches
 Cassandra instances remotely inside a sub-process. Farsandra is done in java
 not python, making it easier to use with java development.

 I will not go and say Farsandra solves all problems. in fact it has it's own
 challenges (building yaml configurations across versions, fetching binary
 cassandra from the internet), but it opens up new opportunities to developer
 complicated multi-node testing scenarios which are impossible due to
 re-entrant embedded cassandra code!

 Have fun.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Upgrading Cassandra from 1.2.11 to 2.0

2013-11-21 Thread Jonathan Ellis
You can't just drop in Apache Cassandra over DSE since it adds custom
replication strategies like this one.

On Thu, Nov 21, 2013 at 9:38 AM, Santosh Shet
santosh.s...@vista-one-solutions.com wrote:
 Hi,



 We are facing problem while upgrading Cassandra which is available in the
 DSE 3.2 from version 1.2.11 to 2.0 .

 Below is the error log we are getting while starting Cassandra.



 java.lang.RuntimeException:
 org.apache.cassandra.exceptions.ConfigurationException: Unable to find
 replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'

 at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:274)

 at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:289)

 at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:130)

 at
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:508)

 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:237)

 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)

 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)

 Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unable to
 find replication strategy class
 'org.apache.cassandra.locator.EverywhereStrategy'

 at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:469)

 at
 org.apache.cassandra.locator.AbstractReplicationStrategy.getClass(AbstractReplicationStrategy.java:290)

 at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:266)

 ... 6 more





 Thanks,

 Santosh Shet

 Software Engineer | VistaOne Solutions

 Direct India : +91 80 30273829 | Mobile India : +91 8105720582

 Skype : santushet





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Wiki popularity

2013-11-19 Thread Jonathan Ellis
We've started counting visits to the wiki pages so we can use that
information to prioritize which pages to improve.  Here's what that
looks like, for the past ~24h:

1,431 wiki.apache.org/cassandra/GettingStarted
366   wiki.apache.org/cassandra/FAQ
284   wiki.apache.org/cassandra/Operations
238   wiki.apache.org/cassandra/FrontPage
209   wiki.apache.org/cassandra/HadoopSupport
209   wiki.apache.org/cassandra/NodeTool
206   wiki.apache.org/cassandra/DebianPackaging
168   wiki.apache.org/cassandra/CassandraCli
159   wiki.apache.org/cassandra/ArchitectureOverview
149   wiki.apache.org/cassandra/ClientOptions
135   wiki.apache.org/cassandra/DataModel
117   wiki.apache.org/cassandra/API
90wiki.apache.org/cassandra/CassandraLimitations
85wiki.apache.org/cassandra/SecondaryIndexes
74wiki.apache.org/cassandra/StorageConfiguration
71wiki.apache.org/cassandra/MemtableSSTable
66wiki.apache.org/cassandra/Administration%20Tools
61wiki.apache.org/cassandra/RunningCassandra

(GettingStarted is by far the most viewed, which is not surprising
since it's linked from the cassanra.a.o front page.)

If you'd like to help improve any of these, and aren't already on the
wiki contributors whitelist, please contact me.  We had to add the
whitelist to stop spam.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Failure detection false positives and vnodes

2013-11-05 Thread Jonathan Ellis
We've been working on tracking down the causes of nodes in the cluster
incorrectly marking other, healthy nodes down.  We've identified three
scenarios.  The first two deal with the Gossip thread blocking while
processing a state change, preventing subsequent heartbeats from being
processed:

1. Write activity + cluster membership changes (CASSANDRA-6297).  The
Gossip stage would block while flushing system.peers, which could get
backed up flushes of user tables.  By default, there is one flush
thread per configured data directory.  (Thus, increasing
memtable_flush_writers in cassandra.yaml can be an effective
workaround, especially if you are on SSDs where the increased
contention will be low.)

2. Cluster membership changes with many keyspaces configured
(CASSANDRA-6244).  Computing the ranges to be transferred between
nodes is linear with respect to the number of keyspaces (since that is
where replication options are configured).  I suspect that enabling
vnodes will exacerbate this as well.

We're still analyzing the third:

3. Large (hundreds to thousands of node) clusters with vnodes enabled
show FD false positives even without cluster membership changes
(CASSANDRA-6127).

Fixes for (1) and (2) are committed and will be in 1.2.12 and 2.0.3.
We can reproduce (3) and hope to have a resolution soon.  In the
meantime, caution is advised when deploying vnode-enabled clusters,
since other pressures on the system could make this a problem with
smaller clusters as well.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: CQL Thrift

2013-08-30 Thread Jonathan Ellis
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update and
 CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Dynamic Columns Question Cassandra 1.2.5, Datastax Java Driver 1.0

2013-06-06 Thread Jonathan Ellis
This is becoming something of a FAQ, so I wrote an more in-depth
answer: 
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

On Thu, Jun 6, 2013 at 8:02 AM, Joe Greenawalt joe.greenaw...@gmail.com wrote:
 Hi,
 I'm having some problems figuring out how to append a dynamic column on a
 column family using the datastax java driver 1.0 and CQL3 on Cassandra
 1.2.5.  Below is what i'm trying:

 cqlsh:simplex create table user (firstname text primary key, lastname
 text);
 cqlsh:simplex insert into user (firstname, lastname) values
 ('joe','shmoe');
 cqlsh:simplex select * from user;

  firstname | lastname
 ---+--
joe |shmoe

 cqlsh:simplex insert into user (firstname, lastname, middlename) values
 ('joe','shmoe','lester');
 Bad Request: Unknown identifier middlename
 cqlsh:simplex insert into user (firstname, lastname, middlename) values
 ('john','shmoe','lester');
 Bad Request: Unknown identifier middlename

 I'm assuming you can do this based on previous based thrift based clients
 like pycassa, and also by reading this:

 The Cassandra data model is a dynamic schema, column-oriented data model.
 This means that, unlike a relational database, you do not need to model all
 of the columns required by your application up front, as each row is not
 required to have the same set of columns. Columns and their metadata can be
 added by your application as they are needed without incurring downtime to
 your application.

 here: http://www.datastax.com/docs/1.2/ddl/index

 Is it a limitation of CQL3 and its connection vs. thrift?
 Or more likely i'm just doing something wrong?

 Thanks,
 Joe



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread Jonathan Ellis
Sounds like you're spending all your time in GC, which you can verify
by checking what GCInspector and StatusLogger say in the log.

Fix is increase your heap size or upgrade to 1.2:
http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote:
 Hello,
 I am observing that my performance is drastically decreasing when my data
 size grows. I have a 3 node cluster with 64 GB of ram and my data size is
 around 400GB on all the nodes. I also see that when I re-start Cassandra the
 performance goes back to normal and then again starts decreasing after some
 time.

 Some hunting landed me to this page
 http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
 about the large data sets and explains that it might be because I am going
 through multiple layers of OS cache, but does not tell me how to tune it.

 So, my question is, are there any optimizations that I can do to handle
 these large datatasets ?

 and why does my performance go back to normal when I restart Cassandra ?

 Thanks !



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Using CQL to insert a column to a row dynamically

2013-05-27 Thread Jonathan Ellis
On Mon, May 27, 2013 at 9:28 AM, Matthew Hillsborough
matthew.hillsboro...@gmail.com wrote:
 I am trying to understand some fundamentals in Cassandra, I was under the
 impression that one of the advantages a developer can take in designing a
 data model is by dynamically adding columns to a row identified by a key.
 That means I can model my data so that if it makes sense, a key can be
 something such as a user_id from a relational database, and I can for
 example, create arbitrary amounts of columns that relate to that user.

Fundamentally?  No.  Experience has shown that having schema to say
email column is text, and birth date column is a timestamp is very
useful as projects and teams grow.

That said, if you really don't know what kinds of attributes might
apply (generally because they are user-generated) you can use a Map.

 Wouldn't this type of model make more sense to just stuff into a relational
 database?

There's nothing wrong with the relational model per se (subject to the
usual explanation about needing to denormalize to scale).  Cassandra
is about making applications scale, not throwing the SQL baby out with
the bathwater for the sake of being different.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Commit Log Magic

2013-05-23 Thread Jonathan Ellis
Sstables must be sorted by token, or we can't compact efficiently.
Since writes usually do not arrive in token order, we stage them first
in a memtable.

(cc user@)

On Thu, May 23, 2013 at 8:44 AM, Ansar Rafique ansa...@hotmail.com wrote:
 Hi Jonathan,

 I am Ansar Rafique and I asked you few questions 2 week ago about Cassandra
 Implementation. I was watching your presentation where you suggested the
 page below.

 http://nosql.mypopescu.com/post/27684111441/cassandra-and-solid-state-drives

 I have a question and I have tried to find the answer but didn't really get
 satisfactory response yet. My question is why Cassandra using Commit log for
 durability instead direct write to SSTable. Cassandra acheives high write
 throughput because it stores data first in memtable and then flush into
 disk. Sounds good but remeber Cassandra also write in commit log for
 durability. I made it sure and it's written that write to memetable and
 commit log is synchronous which means it will write first in commit log and
 wait until it complete and will start writing in memtable or vice versa.
 Writing transaction to commit log requires an I/O operation which means for
 each insert we need an I/O :( for writing data in commit log and later
 requires more I/O's to flush data again on disk. Isn't writing to commit log
 is overhead ? Isn't it better to directly write data on disk instead of
 commit log ?

 Remember I/O operations are expensive and reduction in I/O's mean
 improvement in performance. If we look at RDBMS, it stores data in commit
 log as well as disk. Fair enough but if we don't insert data in commit log.
 It's performance should be the same as Cassandra because it perform I/O to
 insert data on disk and Cassandra also perform's I/O to insert data on
 commit log. Is commit log is less expensive ? I didn't really understood the
 magic :) Would you like to elaborate it more ?

 Thank you in advance for your time. Looking to hear from you.

 Regards,
 Ansar Rafique







-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: CQL3 question

2013-05-13 Thread Jonathan Ellis
using GET or LIST from the cli will do what you want

it's a bad idea to have One Big Partition, since partitions by nature
are not spread across multiple machines.  in general you'll want to
keep partitions under ~ 1M cells or ~100K CQL3 rows.

On Sun, May 12, 2013 at 12:53 AM, Sam Mandes eng.salaman...@gmail.com wrote:
 Hello Jonathan,

 I read your blog
 post:http://www.datastax.com/dev/blog/cql3-for-cassandra-experts and enjoyed
 it so much. I am new to the NoSQL world, I came from the SQL world.

 I noticed that Cassandra is pushing CQL3 more, it's even recommended to use
 CQL3 for new projects instead of the Thrift API. I believe Cassandra is
 going to drop Thrift one day. I know that one can use the compact storage to
 allow backward compatibility. And I know that CQL3 uses the new binary
 protocol instead of Thrift now. I believe they both use the same storage
 engine. (I still do not understand why they are incompatible!)

 Thus, I was wondering is there is a possible way that I can view the tables
 created with CQL3 in a lower-level view like I used with Thrift? I mean I
 can view the tables as simply CFs, as how rows are exactly stored, just
 something to expose the internal representation?

 I've another question, when using compact storage and creating a table with
 composite primary key, Cassandra uses a single rows with multiple columns
 but if I've lots of items and the columns limit is 20 billion, how can this
 be avoided. I do not understand how CQL3 unpacking helps in this situation?

 Sorry for any inconvenience :)

 Thanks a lot,
 Sam



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Ten talks you shouldn’t miss at the Cassandra Summit

2013-05-08 Thread Jonathan Ellis
The Cassandra Summit is just over a month away!  I wrote up my
thoughts on the talks I'm most excited for here:
http://www.datastax.com/dev/blog/ten-talks-you-shouldnt-miss-at-the-cassandra-summit

Don't forget to register with the code SFSummit25 for a 25% discount:
http://datastax.regsvc.com/E2

(Want to go, but your company won't pay?  Let me know off-list and
I'll see what I can do.)

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Backup and restore between different node-sized clusters

2013-05-08 Thread Jonathan Ellis
You want to use sstableloader when the cluster sizes are different; it
will stream things to the right places in the new one.

On Wed, May 8, 2013 at 6:03 PM, Ron Siemens rsiem...@greatergood.com wrote:

 I have a 3-node cluster in production and a single-node development cluster.  
 I tested snapshotting a column family from the 3-node production cluster, 
 grouping the files together, and restoring onto my single node development 
 system.  That worked fine.  Can I go the other direction?  It's not easy for 
 me to test in that direction: I'll get the chance at some point but would 
 like to hear if you've done this.

 If I just put the snapshot from the single node cluster on one of the nodes 
 from the 3-node cluster, and do a JMX loadNewSSTables on that node, will the 
 data load correctly into the 3-nodes?  Or is something more complex involved?

 FYI, I'm following the instructions below, but only doing per column family 
 backup and restore.

 http://www.datastax.com/docs/1.2/operations/backup_restore

 Thanks,
 Ron




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: index_interval

2013-05-08 Thread Jonathan Ellis
index_interval won't be going away, but you won't need to change it as
often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521

On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 I heard a rumor that index_interval is going away?  What is the replacement 
 for this?  (we have been having to play with this setting a lot lately as too 
 big and it gets slow yet too small and cassandra uses way too much RAM…we are 
 still trying to find the right balance with this setting).

 Thanks,
 Dean



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Cassandra NYC event followup

2013-05-02 Thread Jonathan Ellis
The videos from the NYC* Big Data Tech Day are all up.  I blogged
about my favorites here:
http://www.datastax.com/dev/blog/my-top-five-talks-from-nyc-big-data-tech-day

Good to meet the NYC community again!

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: ordered partitioner

2013-04-22 Thread Jonathan Ellis
Not in general, no.  There are places, like indexing, that need to use
a local partitioner rather than the global one.

Which uses of the DK constructor looked erroneous to you?

On Mon, Apr 22, 2013 at 10:54 AM, Desimpel, Ignace
ignace.desim...@nuance.com wrote:
 Hi,



 I was trying to implement my own ordered partitioner and got into problems.

 The current DecoratedKey is using a ByteBufferUtil.compareUnsigned for
 comparing the key. I was thinking of having a signed comparison, so I
 thought of making my own DecoratedKey, Token and Partitioner. That way I
 would have complete control…

 So  made a partitioner whith a function decorateKey(…) returning
 MyDecoratedKey in stead of DecoratedKey

 But when making my own MyDecoratedKey, the database get into trouble when
 adding a key space due to the fact that some code in Cassandra is using the
 ‘new DecoratedKey(…)’ statement and is not using the partitioner function
 decorateKey(…).



 Would it be logical to always call the partitioner function decorateKey such
 that the creation of an own partitioner and key decoration is possible?



 Ignace Desimpel





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Cassandra Summit 2013

2013-04-12 Thread Jonathan Ellis
Hi all,

Last year's Summit saw fantastic talks [1] and over 800 attendees.
The feedback was enthusiastic; the most commonly requested improvement
was to extend it to two days.

We're pleased to deliver just that for 2013!  This year's Cassandra
Summit will be at Fort Mason in San Francisco, California from June
11th - 12th, with 45+ sessions covering Cassandra use cases,
development tips and tricks, war stories, how-tos, and more.

The popular meet the experts room will also return.  Engineers and
committers from companies such as Spotify, eBay, Netflix, Comcast,
BlueMountain Capital, and DataStax will be there excited to share
their Cassandra experiences.

The schedule of talks is about 90% final.  To view it and register,
visit http://www.datastax.com/company/news-and-events/events/cassandrasummit2013
and use the code SFSummit25 for 25% off.

See you there!

[1] 
http://www.datastax.com/company/news-and-events/events/cassandrasummit2012/presentations

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Vnodes - HUNDRED of MapReduce jobs

2013-03-29 Thread Jonathan Ellis
I still don't see the hole in the following reasoning:

- Input splits are 64k by default.  At this size, map processing time
dominates job creation.
- Therefore, if job creation time dominates, you have a toy data set
( 64K * 256 vnodes = 16 MB)

Adding complexity to our inputformat to improve performance for this
niche does not sound like a good idea to me.

On Thu, Mar 28, 2013 at 8:40 AM, cem cayiro...@gmail.com wrote:
 Hi Alicia ,

 Cassandra input format creates mappers as many as vnodes. It is a known
 issue. You need to lower the number of vnodes :(

 I have a simple solution for that and ready to write a patch. Should I
 create a ticket about that? I don't know the procedure about that.

  Regards,
 Cem


 On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong lccali...@gmail.com wrote:

 Hi All,

 I have 3 nodes of Cassandra 1.2.3  edited the cassandra.yaml for vnodes.

 When I execute a M/R job .. the console showed HUNDRED of Map tasks.

 May I know, is the normal since is vnodes?  If yes, this have slow the M/R
 job to finish/complete.


 Thanks





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Vnodes - HUNDRED of MapReduce jobs

2013-03-29 Thread Jonathan Ellis
My point is that if you have over 16MB of data per node, you're going
to get thousands of map tasks (that is: hundreds per node) with or
without vnodes.

On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 Every map reduce task typically has a minimum Xmx of 256MB memory. See
 mapred.child.java.opts...
 So if you have a 10 node cluster with 256 vnodes... You will need to spawn
 2,560 map tasks to complete a job.
 And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map
 slots.

 Wouldnt it be better if the input format spawned 10 map tasks instead of
 2,560?


 On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis jbel...@gmail.com wrote:

 I still don't see the hole in the following reasoning:

 - Input splits are 64k by default.  At this size, map processing time
 dominates job creation.
 - Therefore, if job creation time dominates, you have a toy data set
 ( 64K * 256 vnodes = 16 MB)

 Adding complexity to our inputformat to improve performance for this
 niche does not sound like a good idea to me.

 On Thu, Mar 28, 2013 at 8:40 AM, cem cayiro...@gmail.com wrote:
  Hi Alicia ,
 
  Cassandra input format creates mappers as many as vnodes. It is a known
  issue. You need to lower the number of vnodes :(
 
  I have a simple solution for that and ready to write a patch. Should I
  create a ticket about that? I don't know the procedure about that.
 
   Regards,
  Cem
 
 
  On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong lccali...@gmail.com
  wrote:
 
  Hi All,
 
  I have 3 nodes of Cassandra 1.2.3  edited the cassandra.yaml for
  vnodes.
 
  When I execute a M/R job .. the console showed HUNDRED of Map tasks.
 
  May I know, is the normal since is vnodes?  If yes, this have slow the
  M/R
  job to finish/complete.
 
 
  Thanks
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: virtual nodes + map reduce = too many mappers

2013-02-16 Thread Jonathan Ellis
Wouldn't you have more than 256 splits anyway, given a normal amount of data?

(Default split size is 64k rows.)

On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 Seems like the hadoop Input format should combine the splits that are
 on the same node into the same map task, like Hadoop's
 CombinedInputFormat can. I am not sure who recommends vnodes as the
 default, because this is now the second problem (that I know of) of
 this class where vnodes has extra overhead,
 https://issues.apache.org/jira/browse/CASSANDRA-5161

 This seems to be the standard operating practice in c* now, enable
 things in the default configuration like new partitioners and newer
 features like vnodes, even though they are not heavily tested in the
 wild or well understood, then deal with fallout.


 On Fri, Feb 15, 2013 at 11:52 AM, cem cayiro...@gmail.com wrote:
 Hi All,

 I have just started to use virtual nodes. I set the number of nodes to 256
 as recommended.

 The problem that I have is when I run a mapreduce job it creates node * 256
 mappers. It creates node * 256 splits. this effects the performance since
 the range queries have a lot of overhead.

 Any suggestion to improve the performance? It seems like I need to lower the
 number of virtual nodes.

 Best Regards,
 Cem





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [RELEASE] Apache Cassandra 1.2 released

2013-01-07 Thread Jonathan Ellis
I'm presenting a webinar on what's new in 1.2 this Wednesday:
http://learn.datastax.com/WebinarWhatsNewin1.2_Registration.html

See you there!

On Wed, Jan 2, 2013 at 9:00 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 The Cassandra team wishes you a very happy new year 2013, and is very
 pleased
 to announce the release of Apache Cassandra version 1.2.0. Cassandra 1.2.0
 is a
 new major release for the Apache Cassandra distributed database. This
 version
 adds numerous improvements[1,2] including (but not restricted to):
 - Virtual nodes[4]
 - The final version of CQL3 (featuring many improvements)
 - Atomic batches[5]
 - Request tracing[6]
 - Numerous performance improvements[7]
 - A new binary protocol for CQL3[8]
 - Improved configuration options[9]
 - And much more...

 Please make sure to carefully read the release notes[2] before upgrading.

 Both source and binary distributions of Cassandra 1.2.0 can be downloaded
 at:

  http://cassandra.apache.org/download/

 Or you can use the debian package available from the project APT
 repository[3]
 (you will need to use the 12x series).

 The Cassandra Team

 [1]: http://goo.gl/JmKp3 (CHANGES.txt)
 [2]: http://goo.gl/47bFz (NEWS.txt)
 [3]: http://wiki.apache.org/cassandra/DebianPackaging
 [4]: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
 [5]: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
 [6]: http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
 [7]:
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 [8]: http://www.datastax.com/dev/blog/binary-protocol
 [9]: http://www.datastax.com/dev/blog/configuration-changes-in-cassandra-1-2



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Read during digest mismatch

2012-11-11 Thread Jonathan Ellis
Correct.  Which is one reason there is a separate setting for
cross-datacenter read repair, by the way.

On Thu, Nov 8, 2012 at 4:43 PM, sankalp kohli kohlisank...@gmail.com wrote:
 Hi,
 Lets say I am reading with consistency TWO and my replication is 3. The
 read is eligible for global read repair. It will send a request to get data
 from one node and a digest request to two.
 If there is a digest mismatch, what I am reading from the code looks like it
 will get the data from all three nodes and do a resolve of the data before
 returning to the client.

 Is it correct or I am readind the code wrong?

 Also if this is correct, look like if the third node is in other DC, the
 read will slow down even when the consistency was TWO?

 Thanks,
 Sankalp





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Hinted Handoff runs every ten minutes

2012-11-11 Thread Jonathan Ellis
How many hint sstables are there?  What does sstable2json show?

On Thu, Nov 8, 2012 at 3:23 PM, Mike Heffner m...@librato.com wrote:
 Is there a ticket open for this for 1.1.6?

 We also noticed this after upgrading from 1.1.3 to 1.1.6. Every node runs a
 0 row hinted handoff every 10 minutes. N-1 nodes hint to the same node,
 while that node hints to another node.


 On Tue, Oct 30, 2012 at 1:35 PM, Vegard Berget p...@fantasista.no wrote:

 Hi,

 I have the exact same problem with 1.1.6.  HintsColumnFamily consists of
 one row (Rowkey 00, nothing more).   The problem started after upgrading
 from 1.1.4 to 1.1.6.  Every ten minutes HintedHandoffManager starts and
 finishes  after sending 0 rows.

 .vegard,



 - Original Message -
 From:
 user@cassandra.apache.org

 To:
 user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 29 Oct 2012 23:45:30 +0100

 Subject:
 Re: Hinted Handoff runs every ten minutes


 Dne 29.10.2012 23:24, Stephen Pierce napsal(a):
  I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0.
 
  How can I check to see why it keeps running HintedHandoff?
 you have tombstone is system.HintsColumnFamily use list command in
 cassandra-cli to check




 --

   Mike Heffner m...@librato.com
   Librato, Inc.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Fw: Fwd: Compound primary key: Insert after delete

2012-10-22 Thread Jonathan Ellis
Mixing the two isn't really recommended because of just this kind of
difficulty, but if you must, I would develop against 1.2 since it will
actually validate that the CT encoding you've done manually is valid.
1.1 will just fail silently.

On Mon, Oct 22, 2012 at 6:57 AM, Vivek Mishra vivek.mis...@yahoo.com wrote:
 Hi,

 I am building support for Composite/Compund keys in Kundera and currently
 getting into number of problems for my POC to access it via Thrift.

 I am planning to use thrift API for insert/update/delete and for query i
 will go by CQL way.


 Issues:
 CompositeTypeRunner.java (see attached): Simple program to perform CRUD, it
 is not inserting against the deleted row key and also thrift API is
 returning column name as Empty string.

 OtherCompositeTypeRunner.java (see attached): Program to demonstrate issue
 with compound primary key as boolean. Column family creation via CQL is
 working fine, But insert via thrift is giving issue with Unconfigured
 column family though it is there!

 This is what i have tried with cassandra 1.1.6 as well.

 Please have a look and share, if i am doing anything wrong?   i did ask same
 on user group but no luck.


 -Vivek




 - Forwarded Message -
 From: Vivek Mishra mishra.v...@gmail.com
 To: vivek.mis...@yahoo.com
 Sent: Monday, October 22, 2012 5:17 PM
 Subject: Fwd: Compound primary key: Insert after delete



 -- Forwarded message --
 From: Vivek Mishra mishra.v...@gmail.com
 Date: Mon, Oct 22, 2012 at 1:08 PM
 Subject: Re: Compound primary key: Insert after delete
 To: user@cassandra.apache.org


 Well. Last 2 lines of code are deleting 1 record and inserting 2 records,
 first one is the deleted one and  a new record. Output from command line:

 [default@unknown] use bigdata;
 Authenticated to keyspace: bigdata
 [default@bigdata] list test1;
 Using default limit of 100
 Using default column limit of 100
 ---
 RowKey: 2
 = (column=3:address, value=4, timestamp=1350884575938)
 ---
 RowKey: 1

 2 Rows Returned.


 -Vivek

 On Mon, Oct 22, 2012 at 1:01 PM, aaron morton aa...@thelastpickle.com
 wrote:

 How is it not working ?

 Can you replicate the problem withe the CLI ?
 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/10/2012, at 7:17 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 code attached. Somehow it is not working with 1.1.5.

 -Vivek

 On Mon, Oct 22, 2012 at 5:20 AM, aaron morton aa...@thelastpickle.com
 wrote:

 Yes AFAIK.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/10/2012, at 12:15 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to reuse same compound primary key after delete? I guess it
 works fine for non composite keys.

 -Vivek



 CompositeTypeRunner.java









-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: potential data loss in Cassandra 1.1.0 .. 1.1.4

2012-10-18 Thread Jonathan Ellis
On Thu, Oct 18, 2012 at 7:30 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 Hi Jonathan.

 We are currently running the datastax AMI on amazon. Cassandra is in version
 1.1.2.

 I guess that the datastax repo (deb http://debian.datastax.com/community
 stable main) will be updated directly in 1.1.6 ?

Yes.

 Could you ask your team to add this specific warning in your documentation
 like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to
 update to last stable release before expand) or here :
 http://www.datastax.com/docs/1.1/install/upgrading or in any other place
 where this could be useful ?

Good idea, I'll get that noted.  Thanks!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


potential data loss in Cassandra 1.1.0 .. 1.1.4

2012-10-17 Thread Jonathan Ellis
I wanted to call out a particularly important bug for those who aren't
in the habit of reading CHANGES.

Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
1.0.x or earlier directly to 1.1.5, you're okay as far as this is
concerned.  But if you used an earlier 1.1 release, you should upgrade
to 1.1.6.

Explanation:

A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
generate commitlog segment IDs.  This could cause data loss in the
event of a power failure, since we assume commitlog IDs are strictly
increasing in our replay logic.  Simplified, the replay logic looks like this:

1. Take the most recent flush time X for each columnfamily
2. Replay all activity in the commitlog that occurred after X

The problem is that nanotime gets effectively a new random seed after
a reboot.  If the new seed is substantially below the old one, any new
commitlog segments will never be after the pre-reboot flush
timestamps.  Subsequently, restarting Cassandra will not replay any
unflushed updates.

We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
didn't realize the implications for replay timestamps until later
(CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
will result in replaying the entire commitlog, including data that may
have already been flushed.

Replaying already-flushed data a second time is harmless -- except for
counters.  So, to avoid replaying flushed counter data, we recommend
performing drain when shutting down the pre-1.1.6 C* prior to upgrade.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Any way to put a hard limit on memory cap for Cassandra ?

2012-10-03 Thread Jonathan Ellis
There are three places that Cassandra will use non-heap memory:

One is JVM overhead like permgen.  This is a normal part of running
Java-based services and will be very stable and predictable.

Another is the off-heap row cache.  By default no row caching is done,
you have to explicitly enable it per-columnfamily.  You can also
control the maximum cache size in cassandra.yaml.

Finally, Cassandra mmap's all its data files by default.  This is a
frequent source of misunderstanding, because mmaping doesn't mean the
memory is used in the normal sense, just that it's mapped into
Cassandra's address space so it can be read most efficiently.  See
http://wiki.apache.org/cassandra/FAQ#mmap for more details.

Note that only the JVM memory itself (heap + overhead) is locked by
JNA.  Disabling JNA will only expose you to a very bad experience
should the OS decide to swap out part of the JVM.  Best practice of
course is to disable swap entirely, but JNA is there as a fall back
because many people do not do this correctly.

Directing followups to the Cassandra user mailing list.

On Wed, Oct 3, 2012 at 3:33 AM, Thomas Yu t...@ruckuswireless.com wrote:
 Hi Jonathan,

 I'd tried to find any information regarding how I can put a hard limit on 
 real memory usage by the Cassandra process, and would appreciate any pointers 
 from you in this front.

 I'm using Cassandra 1.0.11, and had been using the ms and mx JVM options to 
 try to limit the heap usage to 750M memory. However, i find that the actual 
 usage of the Cassandra process is around 1G, and I understand that is related 
 to the JNA, and locked memory (likely rooted from the PermGen) in mmap.

 However, what I really want to understand is that if there's any way I can 
 put a hard limit on the real memory usage of Cassandra ?? Do I have to 
 disable JNA in order to achieve that ?? Or otherwise, can I fairly estimate 
 that the PermGen shall be pretty stable such that I can be fairly expect it 
 won't exceed too much out of the 250M that I observed in the behavior of my 
 application ? What about later releases of Cassandra (e.g. 1.1, or 1.2) ? Is 
 there any option to help on this front ?

 Thanks in advance for any pointers that you can provide to help me understand 
 this issue.

 Best Regards,

 -Thomas




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-11 Thread Jonathan Ellis
Relatedly, I'd love to learn how to reliably reproduce full GC pauses
on C* 1.1+.

On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Dynamic Column Families in CQLSH v3

2012-08-29 Thread Jonathan Ellis
To elaborate, we don't know yet how to expose DCT in CQL3.  If you can
give more background on what you're using DCT for, that would help.

(If we're lucky, it's also possible that you don't actually need DCT
-- Collections in 1.2 is done entirely with classic CT under the
hood.)

On Mon, Aug 27, 2012 at 5:56 PM, aaron morton aa...@thelastpickle.com wrote:
 It's not possible to have Dynamic Columns in CQL 3. The CF definition must
 specify the column names you expect to store.

 The COMPACT STORAGE
 (http://www.datastax.com/docs/1.1/references/cql/CREATE_COLUMNFAMILY) clause
 of the Create CF statement means can have column names that are part dynamic
 part static. But if you want to have CF's where the app code controls the
 column names you need to create the CF using the CLI and stick with the
 Thrift API. (because SELECT in CQL 3 does not support arbitrary column
 slicing.)

 Background
 http://www.mail-archive.com/user@cassandra.apache.org/msg23636.html

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 24/08/2012, at 2:24 PM, Erik Onnen eon...@gmail.com wrote:

 Hello All,

 Attempting to create what the Datastax 1.1 documentation calls a
 Dynamic Column Family
 (http://www.datastax.com/docs/1.1/ddl/column_family#dynamic-column-families)
 via CQLSH.

 This works in v2 of the shell:

 create table data ( key varchar PRIMARY KEY) WITH comparator=LongType;

 When defined this way via v2 shell, I can successfully switch to v3
 shell and query the CF fine.

 The same syntax in v3 yields:

 Bad Request: comparator is not a valid keyword argument for CREATE TABLE

 The 1.1 documentation indicates that comparator is a valid option for
 at least ALTER TABLE:

 http://www.datastax.com/docs/1.1/configuration/storage_configuration#comparator

 This leads me to believe that the correct way to create a dynamic
 column family is to create a table with no named columns and alter the
 table later but that also does not work:

 create table data (key varchar PRIMARY KEY);

 yields:

 Bad Request: No definition found that is not part of the PRIMARY KEY

 So, my question is, how do I create a Dynamic Column Family via the CQLSH
 v3?

 Thanks!
 -erik





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Problem while configuring key and row cache?

2012-08-21 Thread Jonathan Ellis
setcachecapacity is obsolete in 1.1+.  Looks like we missed removing
it from nodetool.  See
http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 for
background.

(Moving to users@.)

On Tue, Aug 21, 2012 at 8:19 AM, Amit Handa amithand...@gmail.com wrote:
 I started exploring apache cassandra 1.1.3. I am facing problem with how to
 improve performance of cassandra using caching configurations.
 I tried setting following configurations:

 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 0
 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 0 25
 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 25
 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 444 444


 But when i am checking that this particula configuration are really been
 configured using command:
 ./nodetool -h 107.108.189.212 cfstats

 it's showing following results for keySpace DemoUser and column Family
 Users:
 *Keyspace: DemoUser
 Read Count: 21914
 Read Latency: 0.08268495026010769 ms.
 Write Count: 87656
 Write Latency: 0.06009481381765082 ms.
 Pending Tasks: 0
 Column Family: Users
 SSTable count: 1
 Space used (live): 1573335
 Space used (total): 1573335
 Number of Keys (estimate): 22016
 Memtable Columns Count: 0
 Memtable Data Size: 0
 Memtable Switch Count: 1
 Read Count: 21914
 Read Latency: 0.083 ms.
 Write Count: 87656
 Write Latency: 0.060 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 0
 Bloom Filter False Ratio: 0.0
 Bloom Filter Space Used: 41104
 Compacted row minimum size: 150
 Compacted row maximum size: 179
 Compacted row mean size: 179 *

 I am unable to see the effect of above setcachecapacity command. Let me
 know how i can configure the cache capacity, and check it's effect.

 With Regards,
 Amit



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: RE Restore snapshot

2012-08-07 Thread Jonathan Ellis
Yes.

On Thu, Aug 2, 2012 at 5:33 AM, Radim Kolar h...@filez.com wrote:

 1) I assume that I have to call the loadNewSSTables() on each node?

 this is same as nodetool refresh?



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-24 Thread Jonathan Ellis
There are Big Data and NoSQL tracks where Cassandra talks would be appropriate.


-- Forwarded message --
From: Nick Burch nick.bu...@alfresco.com
Date: Thu, Jul 19, 2012 at 1:14 PM
Subject: Call for Papers for ApacheCon Europe 2012 now open!
To: committ...@apache.org


Hi All

We're pleased to announce that the Call for Papers for ApacheCon
Europe 2012 is finally open!

(For those who don't already know, ApacheCon Europe will be taking
place between the 5th and the 9th of November this year, in Sinsheim,
Germany.)

If you'd like to submit a talk proposal, please visit the conference
website at http://www.apachecon.eu/ and sign up for a new account.
Once you've signed up, use your dashboard to enter your speaker bio,
then submit your talk proposal(s). There's more information on the CFP
page on the conference website.

We welcome talk proposals from all projects, from right across the
bredth of projects at the foundation! To make things easier for talk
selection and scheduling, we'd ask that you tag your proposal with the
track that it most closely fits within. The details of the tracks, and
what projects they expect to cover, are available at
http://www.apachecon.eu/tracks/.

(If your project/group of projects was intending to submit a track,
and missed the deadline, then please get in touch with us on
apachecon-disc...@apache.org  straight away, so we can work out if
it's possible to squeeze you in...)

The CFP will close on Friday 3rd August, so you've a little over weeks
to send in your talk proposal. Don't put it off! We'll look forward to
seeing some great ones shortly!

Thanks
Nick
(On behalf of the Conferences committee)


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Cassandra Summit 2012

2012-07-13 Thread Jonathan Ellis
Hi all,

The 2012 Cassandra Summit will be in San Jose on August 8.  The 2011
Summit sold out with almost 500 attendees; this year we found a bigger
venue to accommodate 700+.  It's fantastic to see the Cassandra
community grow like this!

The 2012 Summit will have *four* talk tracks, plus the popular Ask
the Experts breakout room where DataStax engineers will take any
question, all day.  Accepted talks are posted at
http://www.datastax.com/events/cassandrasummit2012#Sessions, and
speaker bios at
http://www.datastax.com/events/cassandrasummit2012#Speakers.  More
abstracts will be posted as they are confirmed.

Learn more and register at
http://www.datastax.com/events/cassandrasummit2012.  Use the
cassandra-list-20 code when registering and save 20%!

P.S. Brandon Williams and I will be conducting a developer training
course immediately before the Summit.  More information at
http://www.datastax.com/services/training

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


2012 Cassandra MVP nominations

2012-07-13 Thread Jonathan Ellis
DataStax would like to recognize individuals who go above and beyond
in their contributions to Apache Cassandra.  To formalize this a
little bit, we're creating an MVP program, the first of which will be
announced at the Cassandra summit [1] in August.

To make this program a success, we need your help to nominate either
yourself or another you think merits consideration.  We're looking for
people who take the initiative organizing user groups, who explain
Cassandra in talks, blogs, Twitter, or other forums, or who answer
questions on the mailing list, IRC, StackOverflow, etc.

Please take five minutes and submit your nomination today at [2].
Nominations will be open throughout the next week.  Those selected
will be notified in advance.

[1] http://www.datastax.com/events/cassandrasummit2012
[2] http://www.surveymonkey.com/s/WVBZGHR

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Java heap space on Cassandra start up version 1.0.10

2012-07-10 Thread Jonathan Ellis
]
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:222)
 at 
 org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:204)
 at 
 org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:194)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:155)
 at 
 org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:224)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Service exit with a return value of 100



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Memtable tuning in 1.0 and higher

2012-07-02 Thread Jonathan Ellis
I'm afraid not. It's too much change for an oldstable release series,
and the bulk of the change is to AtomicSortedColumns which doesn't
exist in 1.0, so even if we wanted to take a maybe it's okay if we
release it first in 1.1.3 and then backport approach it wouldn't
improve our safety margin since you'd basically need to rewrite the
patch.

On Sun, Jul 1, 2012 at 6:40 AM, Joost Van De Wijgerd jwijg...@gmail.com wrote:
 Hi Jonathan,

 Looks good, any chance of porting this fix to the 1.0 branch?

 Kind regards

 Joost

 Sent from my iPhone


 On 1 jul. 2012, at 09:25, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
 jwijg...@gmail.com wrote:
 the currentThoughput is increased even before the data is merged into the
 memtable so it is actually measuring the throughput afaik.

 You're right.  I've attached a patch to
 https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Failed to solve Digest mismatch

2012-07-01 Thread Jonathan Ellis
)
 removed index entry for cleaned-up value DecoratedKey(32,
 3332):ColumnFamily(queue.idxPartitionId
 [7878323239537570657254616e67307878:true:4@1340870382109001,])
 DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103)
 removed index entry for cleaned-up value
 DecoratedKey(3898026790553046681950927403065,
 31333430383730333531373839):ColumnFamily(queue.idxRecvTime
 [7878323239537570657254616e67307878:true:4@1340870382109003,])
 DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103)
 removed index entry for cleaned-up value
 DecoratedKey(3898026790552830793920833138736,
 31333430383431363030303030):ColumnFamily(queue.idxRecvTimeRange
 [7878323239537570657254616e67307878:true:4@1340870382109010,])
 DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103)
 removed index entry for cleaned-up value DecoratedKey(test,
 74657374):ColumnFamily(queue.idxServiceProvider
 [7878323239537570657254616e67307878:true:4@1340870382109007,])
 DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 RowMutationVerbHandler.java
 (line 56) RowMutation(keyspace='drc',
 key='7878323239537570657254616e67307878', modifications=[ColumnFamily(queue
 -deleted at 1340870382185000- [])]) applied.  Sending response to
 6553@/192.168.0.3
 DEBUG [ReadStage:17] 2012-06-28 15:59:42,198 CollationController.java (line
 77) collectTimeOrderedData
 DEBUG [ReadStage:17] 2012-06-28 15:59:42,199 ReadVerbHandler.java (line 58)
 Read key 7878323239537570657254616e67307878; sending response to
 6556@/192.168.0.3

 BRs
 //Ares



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-07-01 Thread Jonathan Ellis
On Wed, Jun 27, 2012 at 5:11 PM, Aaron Turner synfina...@gmail.com wrote:
 Honestly, I think using the same terms as a RDBMS does
 makes users think they're exactly the same thing and have the same
 properties... which is close enough in some cases, but dangerous in
 others.

The point is that thinking in terms of the storage engine is difficult
and unnecessary.  You can represent that data relationally, which is
the Right Thing to do both because people are familiar with that world
and because it decouples model from representation, which lets us
change the latter if necessary.

http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: items removed from 1.1.0 cfstats output

2012-07-01 Thread Jonathan Ellis
They were removed because in 1.1 caches are global and not per-cf:
http://www.datastax.com/dev/blog/caching-in-cassandra-1-1

On Fri, Jun 29, 2012 at 5:45 AM, Bill b...@dehora.net wrote:
 Were

 Key cache capacity:
 Key cache size:
 Key cache hit rate:
 Row cache:

 removed from cfstats in 1.1.0? I can see them in 1.0.8 but not 1.1.0. If so,
 was wondering why, as they're fairly useful :)

 Bill



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: upgrade issue

2012-07-01 Thread Jonathan Ellis
$ConstructMapping.constructJavaBean2ndStep(Constructor.java:240)
         ... 11 more
 null; Can't construct a java object for tag:yaml.org,2002:org.
 apache.cassandra.config.Config; exception=Cannot create
 property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache.
 cassandra.config.Config@4dd36dfe; Unable to find property
 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra.
 config.Config
 Invalid yaml; unable to start server.  See log for stacktrace.


 Thanks  Regards

 Adeel Akbar



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Question on pending tasks in compaction manager

2012-07-01 Thread Jonathan Ellis
Pending compactions is just an estimate of how many compactions
does Cassandra think it will take to get to fully-compacted state;
there are no actual tasks enqueued anywhere.

You could enable debug logging on org.apache.cassandra.db.compaction,
and force a compaction with nodetool to see why no compactions happen
when the estimate says there is still work to do.

On Fri, Jun 29, 2012 at 4:27 AM, Martin McGovern
martin.mcgov...@gmail.com wrote:
 Hi All,

 Could someone explain why the compaction manager stops compacting when it
 has a number of pending tasks?

 I have a test cluster that I am using to stress test IO throughput, i.e.
 find out what a safe load for our hardware is. Over a 16 hour period my node
 cluster completes approximately 49,000 tasks per node. After stopping my
 test compaction continues for a few minutes then stops. There are ~7,000
 tasks still pending. No more tasks will be executed until I start another
 test and the 7000 pending will never be executed.

 I'm using leveled compaction with 5MB SS tables and my tests have a 50:50
 read:write ratio. Each value is a 10K byte array with random content.

 Thanks,
 Martin



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: jscv CPU Consumption

2012-07-01 Thread Jonathan Ellis
Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs to me.

On Fri, Jun 29, 2012 at 1:45 AM, Olivier Mallassi omalla...@octo.com wrote:
 Hi all

 We have a 12 servers clusters (8 cores by machines..).
 OS is Ubuntu 10.04.2.

 On one of the machine (only one) and without any load (no inserts, no
 reads), we have a huge CPU Load whereas there is no activities (no
 compaction in progress etc...)
 A top on the machine show us the process jscv is using all the available
 CPUs.

 Is that link to JNA? do you have any ideas?

 Cheers

 --
 
 Olivier Mallassi
 OCTO Technology
 
 50, Avenue des Champs-Elysées
 75008 Paris

 Mobile: (33) 6 28 70 26 61
 Tél: (33) 1 58 56 10 00
 Fax: (33) 1 58 56 10 01

 http://www.octo.com
 Octo Talks! http://blog.octo.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Memtable tuning in 1.0 and higher

2012-07-01 Thread Jonathan Ellis
On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd
jwijg...@gmail.com wrote:
 the currentThoughput is increased even before the data is merged into the
 memtable so it is actually measuring the throughput afaik.

You're right.  I've attached a patch to
https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Distinct with Cql

2012-06-28 Thread Jonathan Ellis
No.

(Moving to user list.)
On Jun 28, 2012 8:17 AM, Fábio Caldas fabio.cal...@gmail.com wrote:

 It´s possible to use distinct on cql?

 --
 Atenciosamente,
 Fábio Caldas



Re: Memtable tuning in 1.0 and higher

2012-06-28 Thread Jonathan Ellis
[moving to user list]

1.0 doesn't care about throughput or op count anymore, only whether
the total memory used by the *currrent* data in the memtables has
reached the global limit.  So, it automatically doesn't count
historical data that's been overwritten in the current memtable.

So, you may want to increase the memory allocated to memtables... or
you may be seeing flushes forced by the commitlog size cap, which you
can also adjust.

But, the bottom line is I'd consider flushing every 5-6 minutes to be
quite healthy; since the amount of time flushing : time not
flushing ratio is quite small, reducing it further is going to give
you negligible benefit (in exchange for longer replay times.)

On Thu, Jun 28, 2012 at 5:09 AM, Joost van de Wijgerd
jwijg...@gmail.com wrote:
 Hi,

 I work for eBuddy, We've been using Cassandra in production since 0.6
 (using 0.7 and 1.0, skipped 0.8) and use it for several Use Cases. One of
 our uses is to persist our sessions.

 Some background, in our case sessions are long lived, we have a mobile
 messaging platform where sessions are essentially eternal. We use cassandra
 as a system of record for our session so in case of scale out or fail over
 we can quickly load the session state again. We use protocolbuffers to
 serailize
 our data into a byte buffer and then store this as a column value in a
 (wide) row. We use a partition based approach to scale and each partition
 has it's own
 row in cassandra. Each session is mapped to a partition and stored in a
 column in this row.

 Every time there is a change in the session (i.e. message add, acked etc)
 we schedule the session to be flushed to cassandra. Every x seconds we flush
 the dirty sessions. So there are a serious number of (over)writes going on
 and not that many reads (unless there is a failover situation or we scale
 out). This
 is using one of the strengths of cassandra.

 In versions 0.6 and 0.7 it was possible to control the memtable settings on
 a CF basis. So for this particular CF we would set the throughput really
 high since there
 are a huge number of overwrites. In the same cluster we have other CFs that
 have a different load pattern.

 Since we moved to version 1.0 however, it has become almost impossible to
 tune our system for this (mixed) workload. Since we now have only two knobs
 to turn (the size
 of the commit log and the total memtable size) and you have introduced the
 liveRation calculation. While this works ok for most workloads, our
 persistent session store
 is really hurt by the fact that the liveRatio cannot be lower than 1.0

 We generally have an actual liveRatio of 0.025 on this CF due to the huge
 number of overwrites. We are now artificially tuning up the total memtable
 size but this interferes
 with our other CFs who have a different workload. Due to this, our
 performance has degraded quite a bit since on our 0.7 version we had our
 session CF tuned so that
 it would flush only once an hour, thus absorbing way more overwrites, thus
 having to do less compactions and on a failover scenario most request could
 be served straight
 from the memtable (since we are doing since column reads there). Currently
 we flush every 5 to 6 minutes under moderate load, so 10 times worse. This
 is with the s
 same heap setting etc.

 Would you guys consider allowing lower values than 1.0 for the liveRatio
 calculation? This would help us a lot. Perhaps make it a flag so it can be
 turned on and off? Ideally
 I would like the possibility back to tune on a CF by CF basis, this could
 be a special setting that needs to be enabled for power users. The default
 being what's there now.

 Also, in the current version the live ration can never adjust downwards, I
 see you guys have already made a fix for this in 1.1 but I have not seen it
 on the 1.0 branch.

 Let me know what you think

 Kind regards,

 Joost

 --
 Joost van de Wijgerd
 joost.van.de.wijgerd@Skype
 http://www.linkedin.com/in/jwijgerd



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Rules for Major Compaction

2012-06-19 Thread Jonathan Ellis
On Tue, Jun 19, 2012 at 2:30 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 You final two sentences are good ground rules. In our case we have
 some column families that have high churn, for example a gc_grace
 period of 4 days but the data is re-written completely every day.
 Write activity over time will eventually cause tombstone removal but
 we can expedite the process by forcing a major at night. Because the
 tables are not really growing the **warning** below does not apply.

Note that Cassandra 1.2 will automatically compact sstables that have
more than a configurable amount of expired data (default 20%).  So you
won't have to force a major for this use case anymore.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Row caching in Cassandra 1.1 by column family

2012-06-19 Thread Jonathan Ellis
rows_cached is actually obsolete in 1.1.  New hotness explained here:
http://www.datastax.com/dev/blog/caching-in-cassandra-1-1

On Mon, Jun 18, 2012 at 7:43 PM, Chris Burroughs
chris.burrou...@gmail.com wrote:
 Check out the rows_cached CF attribute.

 On 06/18/2012 06:01 PM, Oleg Dulin wrote:
 Dear distinguished colleagues:

 I don't want all of my CFs cached, but one in particular I do.

 How can I configure that ?

 Thanks,
 Oleg





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: cassandra secondary index with

2012-06-19 Thread Jonathan Ellis
That this will get you *worse* performance than just doing a seq scan would.

Details as to why this is, are here:
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

On Tue, Jun 19, 2012 at 2:48 PM, Yuhan Zhang yzh...@onescreen.com wrote:
 To anwser my own question:

 There should be at least on equal expression in the indexed query to
 combine with a gte.
 so, I just added an trivial column that stays constant for equal comparison.
 and it works.

 not sure why this requirement exists.

 Thank you.

 Yuhan


 On Tue, Jun 19, 2012 at 12:23 PM, Yuhan Zhang yzh...@onescreen.com wrote:

 Hi all,

 I'm trying to search by the secondary index of cassandra with greater
 than or equal. but reached an exception stating:
 me.prettyprint.hector.api.exceptions.HInvalidRequestException:
 InvalidRequestException(why:No indexed columns present in index clause with
 operator EQ)

 However, the same column family with the same column, work when the search
 expression is an equal. I'm using the Hector java client.
 The secondary index type has been set to: {column_name: sport,
 validation_class: DoubleType, index_type:KEYS }

 here's the code reaching the exception:

 public QueryResultOrderedRowsString, String, Double
 getIndexedSlicesGTE(String columnFamily, String columnName, double value,
 String... columns) {
         Keyspace keyspace = getKeyspace();
         StringSerializer se = CassandraStorage.getStringExtractor();

         IndexedSlicesQueryString, String, Double indexedSlicesQuery =
 createIndexedSlicesQuery(keyspace, se, se, DoubleSerializer.get());
         indexedSlicesQuery.setColumnFamily(columnFamily);
         indexedSlicesQuery.setStartKey();
         if(columns != null)
             indexedSlicesQuery.setColumnNames(columns);
         else {
             indexedSlicesQuery.setRange(, , true, MAX_RECORD_NUMBER);
         }

 indexedSlicesQuery.setRowCount(CassandraStorage.MAX_RECORD_NUMBER);
         indexedSlicesQuery.addGteExpression(columnName, value);
 // this doesn't work :(
         //indexedSlicesQuery.addEqualsExpression(columnName, value);    //
 this works!
         QueryResultOrderedRowsString, String, Double result =
 indexedSlicesQuery.execute();

         return result;
     }


 Is there any column_meta setting that is required in order to make GTE
 comparison works on secondary index?

 Thank you.

 Yuhan Zhang






 --
 Yuhan Zhang
 Application Developer
 OneScreen Inc.
 yzh...@onescreen.com
 www.onescreen.com

 The information contained in this e-mail is for the exclusive use of the
 intended recipient(s) and may be confidential, proprietary, and/or legally
 privileged. Inadvertent disclosure of this message does not constitute a
 waiver of any privilege.  If you receive this message in error, please do
 not directly or indirectly print, copy, retransmit, disseminate, or
 otherwise use the information. In addition, please delete this e-mail and
 all copies and notify the sender.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: FYI: Java 7u4 on Linux requires higher stack size

2012-05-25 Thread Jonathan Ellis
Thanks, we're investigating in
https://issues.apache.org/jira/browse/CASSANDRA-4275.

On Fri, May 25, 2012 at 10:31 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  Hell all,

 ** **

 We’ve started to test Oracle Java 7u4 (currently we’re on 7u3) on Linux to
 try G1 GC.

 ** **

 Cassandra can’t start on 7u4 with exception:

 ** **

 The stack size specified is too small, Specify at least 160k

 Cannot create Java VM

 ** **

 Changing in cassandra-env.sh -Xss128k to -Xss160k allowed to start
 Cassandra, but when Thrift client disconnects, Cassandra log fills with
 exceptions:

 ** **

 ERROR 17:08:56,300 Fatal exception in thread Thread[Thrift:13,5,main]

 java.lang.StackOverflowError

 at java.net.SocketInputStream.socketRead0(Native Method)

 at java.net.SocketInputStream.read(Unknown Source)

 at java.net.SocketInputStream.read(Unknown Source)

 at java.io.BufferedInputStream.fill(Unknown Source)

 at java.io.BufferedInputStream.read1(Unknown Source)

 at java.io.BufferedInputStream.read(Unknown Source)

 at
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 

 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

 at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 

 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 

 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 

 at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 

 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 

 at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 

 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 

 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)

 at java.lang.Thread.run(Unknown Source)

 ** **

 Increasing stack size from 160k to 192k eliminated such excepitons.

 ** **

 ** **

 Just wanted you to know if someone tries to migrate to Java 7u4.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
signature-logo29.png

Re: supercolumns with TTL columns not being compacted correctly

2012-05-22 Thread Jonathan Ellis
Additionally, it will always take at least two compaction passes to
purge an expired column: one to turn it into a tombstone, and a second
(after gcgs) to remove it.

On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita mor.y...@gmail.com wrote:
 Data will not be deleted when those keys appear in other stables outside of
 compaction. This is to prevent obsolete data from appearing again.

 yuki

 On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:

 Hi Samal,



 Thanks for your time looking into this.



 I force the compaction by using forceUserDefinedCompaction on only that
 particular sstable. This gurantees me the new sstable being written only
 contains the data from the old sstable.

 The data in the sstable is more than 31 days old and gc_grace is 0, but
 still the data from the sstable is being written to the new one, while I am
 100% sure all the data is invalid.



 Kind regards,

 Pieter Callewaert



 From: samal [mailto:samalgo...@gmail.com]
 Sent: dinsdag 22 mei 2012 14:33
 To: user@cassandra.apache.org
 Subject: Re: supercolumns with TTL columns not being compacted correctly



 Data will remain till next compaction but won't be available. Compaction
 will delete old sstable create new one.

 On 22-May-2012 5:47 PM, Pieter Callewaert pieter.callewa...@be-mobile.be
 wrote:

 Hi,



 I’ve had my suspicions some months, but I think I am sure about it.

 Data is being written by the SSTableSimpleUnsortedWriter and loaded by the
 sstableloader.

 The data should be alive for 31 days, so I use the following logic:



 int ttl = 2678400;

 long timestamp = System.currentTimeMillis() * 1000;

 long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl *
 1000));



 And using this to write it:



 sstableWriter.newRow(bytes(entry.id));

 sstableWriter.newSuperColumn(bytes(superColumn));

 sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl,
 expirationTimestampMS);



 This works perfectly, data can be queried until 31 days are passed, then no
 results are given, as expected.

 But the data is still on disk until the sstables are being recompacted:



 One of our nodes (we got 6 total) has the following sstables:

 [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G

 -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19
 /data/MapData007/HOS-hc-125620-Data.db

 -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17
 /data/MapData007/HOS-hc-163141-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17
 /data/MapData007/HOS-hc-172106-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50
 /data/MapData007/HOS-hc-181902-Data.db

 -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37
 /data/MapData007/HOS-hc-191448-Data.db

 -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41
 /data/MapData007/HOS-hc-193842-Data.db

 -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03
 /data/MapData007/HOS-hc-196210-Data.db

 -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20
 /data/MapData007/HOS-hc-196779-Data.db

 -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33
 /data/MapData007/HOS-hc-58572-Data.db

 -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59
 /data/MapData007/HOS-hc-61630-Data.db

 -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46
 /data/MapData007/HOS-hc-63857-Data.db

 -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41
 /data/MapData007/HOS-hc-87900-Data.db



 As you can see, the following files should be invalid:

 /data/MapData007/HOS-hc-58572-Data.db

 /data/MapData007/HOS-hc-61630-Data.db

 /data/MapData007/HOS-hc-63857-Data.db



 Because they are all written more than an moth ago. gc_grace is 0 so this
 should also not be a problem.



 As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db.

 Expected behavior should be an empty file is being written because all data
 in the sstable should be invalid:



 Compactionstats is giving:

 compaction type    keyspace   column family bytes compacted bytes
 total  progress

    Compaction  MapData007 HOS 11518215662
 532355279724 2.16%



 And when I ls the directory I find this:

 -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12
 /data/MapData007/HOS-tmp-hc-196898-Data.db



 The sstable is being 1-on-1 copied to a new one. What am I missing here?

 TTL works perfectly, but is it giving a problem because it is in a super
 column, and so never to be deleted from disk?



 Kind regards

 Pieter Callewaert | Web  IT engineer

  Be-Mobile NV | TouringMobilis

  Technologiepark 12b - 9052 Ghent - Belgium

 Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 |  Cell + 32 473 777 121







-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra

Re: supercolumns with TTL columns not being compacted correctly

2012-05-22 Thread Jonathan Ellis
Correction: the first compaction after expiration + gcgs can remove
it, even if it hasn't been turned into a tombstone previously.

On Tue, May 22, 2012 at 9:37 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Additionally, it will always take at least two compaction passes to
 purge an expired column: one to turn it into a tombstone, and a second
 (after gcgs) to remove it.

 On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita mor.y...@gmail.com wrote:
 Data will not be deleted when those keys appear in other stables outside of
 compaction. This is to prevent obsolete data from appearing again.

 yuki

 On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:

 Hi Samal,



 Thanks for your time looking into this.



 I force the compaction by using forceUserDefinedCompaction on only that
 particular sstable. This gurantees me the new sstable being written only
 contains the data from the old sstable.

 The data in the sstable is more than 31 days old and gc_grace is 0, but
 still the data from the sstable is being written to the new one, while I am
 100% sure all the data is invalid.



 Kind regards,

 Pieter Callewaert



 From: samal [mailto:samalgo...@gmail.com]
 Sent: dinsdag 22 mei 2012 14:33
 To: user@cassandra.apache.org
 Subject: Re: supercolumns with TTL columns not being compacted correctly



 Data will remain till next compaction but won't be available. Compaction
 will delete old sstable create new one.

 On 22-May-2012 5:47 PM, Pieter Callewaert pieter.callewa...@be-mobile.be
 wrote:

 Hi,



 I’ve had my suspicions some months, but I think I am sure about it.

 Data is being written by the SSTableSimpleUnsortedWriter and loaded by the
 sstableloader.

 The data should be alive for 31 days, so I use the following logic:



 int ttl = 2678400;

 long timestamp = System.currentTimeMillis() * 1000;

 long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl *
 1000));



 And using this to write it:



 sstableWriter.newRow(bytes(entry.id));

 sstableWriter.newSuperColumn(bytes(superColumn));

 sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl,
 expirationTimestampMS);



 This works perfectly, data can be queried until 31 days are passed, then no
 results are given, as expected.

 But the data is still on disk until the sstables are being recompacted:



 One of our nodes (we got 6 total) has the following sstables:

 [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G

 -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19
 /data/MapData007/HOS-hc-125620-Data.db

 -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17
 /data/MapData007/HOS-hc-163141-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17
 /data/MapData007/HOS-hc-172106-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50
 /data/MapData007/HOS-hc-181902-Data.db

 -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37
 /data/MapData007/HOS-hc-191448-Data.db

 -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41
 /data/MapData007/HOS-hc-193842-Data.db

 -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03
 /data/MapData007/HOS-hc-196210-Data.db

 -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20
 /data/MapData007/HOS-hc-196779-Data.db

 -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33
 /data/MapData007/HOS-hc-58572-Data.db

 -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59
 /data/MapData007/HOS-hc-61630-Data.db

 -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46
 /data/MapData007/HOS-hc-63857-Data.db

 -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41
 /data/MapData007/HOS-hc-87900-Data.db



 As you can see, the following files should be invalid:

 /data/MapData007/HOS-hc-58572-Data.db

 /data/MapData007/HOS-hc-61630-Data.db

 /data/MapData007/HOS-hc-63857-Data.db



 Because they are all written more than an moth ago. gc_grace is 0 so this
 should also not be a problem.



 As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db.

 Expected behavior should be an empty file is being written because all data
 in the sstable should be invalid:



 Compactionstats is giving:

 compaction type    keyspace   column family bytes compacted bytes
 total  progress

    Compaction  MapData007 HOS 11518215662
 532355279724 2.16%



 And when I ls the directory I find this:

 -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12
 /data/MapData007/HOS-tmp-hc-196898-Data.db



 The sstable is being 1-on-1 copied to a new one. What am I missing here?

 TTL works perfectly, but is it giving a problem because it is in a super
 column, and so never to be deleted from disk?



 Kind regards

 Pieter Callewaert | Web  IT engineer

  Be-Mobile NV | TouringMobilis

  Technologiepark 12b - 9052

Re: need some clarification on recommended memory size

2012-05-19 Thread Jonathan Ellis
So, you're doing about 20 ops/s where each op consists of read 2
metadata columns, then read ~250 columns of ~2K each.  Is that right?

Is your test client multithreaded?  Is it on a separate machine from
the Cassandra server?

What is your bottleneck?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

On Thu, May 17, 2012 at 1:08 PM, Yiming Sun yiming@gmail.com wrote:
 Hi Aaron,

 Thank you for guiding us by breaking down the issue.  Please see my answers
 embedded

 Is this a single client ?

 Yes

 How many columns is it asking for ?

 the client knows a list of all row keys, and it randomly picks 100, and
 loops 100 times.  It first reads a metadata column to figure out how many
 columns to read, and it then reads these columns

 What sort of query are you sending, slice or named columns?

 currently all queries are slice queries.  so the first slice query reads the
 metadata column (actually 2 metadata columns, one is for Number of columns
 to read, the other for other information which is not needed for the purpose
 of performance test, but I kept it in there to make it similar to the real
 situation).    It then generates the column name array and sends the second
 slice query.

 The timing for the queries is completely isolated, and excludes the time
 spent generating column name array etc.


  From the client side how long is a single read taking ?

 I am not 100% sure on what you are asking... are you saying how long it
 takes for SliceQuery.execute()?  The average we are getting are between
 50-70 ms, and nodetool report similar latency, differ by 5-10ms at top.


 What is the write workload like?  it sounds like it's write once read
 many.

 Indeed it is like a WORM environment. For the performance, we don't have any
 writes.

 memory speed  network speed

 yes.  right now, our data is only a sample about 250K rows, so the default
 200,000 key cache hits above 90%.  But we soon will be hosting the real deal
 with about 3M rows, so I am not sure our memory size will be able to keep up
 with it.

 In any case, Aaron, please let us know if you have any
 suggestions/comments/insights.  Thanks!

 -- Y.


 On Thu, May 17, 2012 at 1:04 AM, aaron morton aa...@thelastpickle.com
 wrote:

 The read rate that I have been seeing is about 3MB/sec, and that is
 reading the raw bytes... using string serializer the rate is even lower,
 about 2.2MB/sec.

 Can we break this down a bit:

 Is this a single client ?
 How many columns is it asking for ?
 What sort of query are you sending, slice or named columns?
 From the client side how long is a single read taking ?
 What is the write workload like?  it sounds like it's write once read
 many.

 Use nodetool cfstats to see what the read latency is on a single node.
 (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there
 much difference between this and the latency from the client perspective ?



 Using JNA may help, but a blog article seems to say it only increase 13%,
 which is not very significant when the base performance is in single-digit
 MBs.

 There are other reasons to have JNA installed: more efficient snapshots
 and advising the OS when file operations should not be cached.

  Our environment is virtualized, and the disks are actually SAN through
 fiber channels, so I don't know if that has impact on performance as well.

 memory speed  network speed

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Exception when truncate

2012-05-19 Thread Jonathan Ellis
Sounds like you have a permissions problem.  Cassandra creates a
subdirectory for each snapshot.

On Thu, May 17, 2012 at 4:57 AM, ruslan usifov ruslan.usi...@gmail.com wrote:
 Hello

 I have follow situation on our test server:

 from cassandra-cli i try to use

 truncate purchase_history;

 3 times i got:

 [default@township_6waves] truncate purchase_history;
 null
 UnavailableException()
        at 
 org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212)
        at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077)
        at 
 org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052)
        at 
 org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445)
        at 
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
        at 
 org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220)
        at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)


 So this looks that truncate goes very slow and too long, than
 rpc_timeout_in_ms: 1 (this can happens because we have very slow
 disck on test machine)

 But in in cassandra system log i see follow exception:


 ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
 AbstractCassandraDaemon.java (line 139) Fatal exception in thread
 Thread[MutationStage:7022,5,main]
 java.io.IOError: java.io.IOException: unable to mkdirs
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
        at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
        at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
        at 
 org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
        at 
 org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
        at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: unable to mkdirs
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
        at 
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140)
        at 
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131)
        at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409)
        ... 7 more


 Also i see that in snapshort dir already exists
 1337242754356-purchase_history directory, so i think that snapshort
 names that generate cassandra not uniquely.

 PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Migration from cassandra 0.8.6 to 1.1.0

2012-05-19 Thread Jonathan Ellis
1.1 will migrate your data to the new directory structure, but it needs the
0.8 schema to do that.  Then you can drop the unwanted keyspace
post-upgrade.

On Fri, May 18, 2012 at 11:58 AM, Harshvardhan Ojha 
harshvardhan.o...@makemytrip.com wrote:

  Hi All,

 ** **

 I am trying to migrate from Cassandra version 0.8.6 to 1.1.0. 

 I had two keyspace and I wanted to keep only one. So I deleted system and
 ran schema again for another keyspace.

 After running schema for keyspace I noticed that new folders are created
 for every column family, inside keyspace folder.

 So data is not available on Cassandra 1.1.0. 

 Is it a new feature to create folder for each column family in keyspace? *
 ***

 How can I get all data from old keyspace in new version? Any suggestion
 would be highly appreciable.

 ** **

 *Harshvardhan Ojha*  *|* Software Developer - Technology Development

 |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon,
 Haryana - 122 016, India

 [image: Description:
 http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]*What's
 new?: Inspire *- Discover an inspiring new way to plan and book travel
 online http://inspire.makemytrip.com/inspire/.

 [image: Description:
 http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/
 

 [image: Description:
 http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php
 *Office Map*

 [image: Description:
 http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1
 *Facebook*

 [image: Description:
 http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals
 *Twitter*

 ** **




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
image003.gifimage005.gifimage004.gifimage001.gifimage002.gif

Re: Migrating a column family from one cluster to another

2012-05-19 Thread Jonathan Ellis
Better: use bin/sstableloader, which will copy exactly the right
ranges of data to the new cluster.

On Fri, May 18, 2012 at 3:39 PM, Rob Coli rc...@palominodb.com wrote:
 On Thu, May 17, 2012 at 9:37 AM, Bryan Fernandez bfernande...@gmail.com 
 wrote:
 What would be the recommended
 approach to migrating a few column families from a six node cluster to a
 three node cluster?

 The easiest way (if you are not using counters) is :

 1) make sure all filenames of sstables are unique [1]
 2) copy all sstablefiles from the 6 nodes to all 3 nodes
 3) run a cleanup compaction on the 3 nodes

 =Rob
 [1] https://issues.apache.org/jira/browse/CASSANDRA-1983

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: while compaction occur EOFException

2012-05-19 Thread Jonathan Ellis
Looks like sstable corruption to me.  Bad memory can often cause this.

You should upgrade to the latest 0.7 release and run nodetool scrub.
I don't think the 0.7.3 scrub was very robust.

On Thu, May 17, 2012 at 1:36 AM, Preston Cheung zhangyf2...@gmail.com wrote:
 While doing compaction, cassandra occured an EOFException, and it seems that
 compaction failed.

 I wonder whether my sstables are corrupt or it is a bug? Thanks all help!

 Our cassandra is 0.7.3.
 CentOS 5.4
 jdk1.7.0

 This is the log:

 INFO [CompactionExecutor:1] 2012-05-17 10:42:18,095 CompactionManager.java
 (line 452) Compacting
 [SSTableReader(path='/data00/data/picasso/value-f-63129-Dat
 a.db'),SSTableReader(path='/data01/data/picasso/value-f-63893-Data.db'),SSTableReader(path='/data01/data/picasso/value-f-63989-Data.db'),SSTableReader(path='
 /data00/data/picasso/value-f-63691-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-61779-Data.db'),SSTableReader(path='/data00/data/picasso/value-
 f-61916-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-61875-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-63296-Data.db'),SSTableRe
 ader(path='/data00/data/picasso/value-f-62139-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-63821-Data.db')]
 ERROR [CompactionExecutor:1] 2012-05-17 10:42:24,306
 AbstractCassandraDaemon.java (line 114) Fatal exception in thread
 Thread[CompactionExecutor:1,1,main]
 java.io.IOError: java.io.EOFException
     at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:117)
     at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:67)
     at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:179)
     at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144)
     at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136)
     at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39)
     at
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
     at
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
     at
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
     at
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
     at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
     at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
     at
 org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
     at
 org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
     at
 org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:505)
     at
 org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:256)
     at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
     at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
     at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.EOFException
     at
 org.apache.cassandra.io.sstable.IndexHelper.skipIndex(IndexHelper.java:65)
     at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:109)
     ... 20 more

 thx
 --
 by Preston Cheung




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Snapshot failing on JSON files in 1.1.0

2012-05-19 Thread Jonathan Ellis
/blotter/twitter_users/snapshots/1337115022389/twitter_users.json
  -rw-r--r-- 1 root root 38778 May 15 20:50
 
  /var/lib/cassandra/data/blotter/twitter_users/snapshots/1337115022389/twitter_users.json
 
 
  We are using Leveled Compaction on the twitter_users CF with I assume is
  creating the JSON files.
 
  [root@cassandra-n6 blotter]# ls -al
  /var/lib/cassandra/data/blotter/twitter_users/*.json
  -rw-r--r-- 1 root root 38779 May 15 20:51
  /var/lib/cassandra/data/blotter/twitter_users/twitter_users.json
  -rw-r--r-- 1 root root 38779 May 15 20:51
  /var/lib/cassandra/data/blotter/twitter_users/twitter_users-old.json
  -rw-r--r-- 1 root root  1040 May 15 20:51
 
  /var/lib/cassandra/data/blotter/twitter_users/twitter_users.twitter_user_attributes_screenname_idx.json
  -rw-r--r-- 1 root root  1046 May 15 20:50
 
  /var/lib/cassandra/data/blotter/twitter_users/twitter_users.twitter_user_attributes_screenname_idx-old.json
 
 
  The other column families which are not using Leveled Compaction seem to
  have their snapshots created successfully.
 
  Any ideas other than turning off Leveled Compaction?
 
 
  Thanks,
 
  Brian
 
 
 
 





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: CQL 3.0 Features

2012-05-19 Thread Jonathan Ellis
In the meantime, Sylvain just posted this:
http://www.datastax.com/dev/blog/cql3-evolutions

On Wed, May 16, 2012 at 11:45 AM, paul cannon p...@datastax.com wrote:
 Sylvain has a draft on https://issues.apache.org/jira/browse/CASSANDRA-3779
 , and that should be an official cassandra project doc real soon now.  If
 you're asking about Datastax's reference docs for CQL 3, they will probably
 be released once Datastax Enterprise or Datastax Community is released with
 Cassandra 1.1.

 p


 On Wed, May 16, 2012 at 10:57 AM, Roland Mechler rmech...@sencha.com
 wrote:

 http://www.datastax.com/dev/blog/whats-new-in-cql-3-0

 It's my understanding that that the actual reference documentation for 3.0
 should be ready soon. Anyone know when?

 -Roland


 On Wed, May 16, 2012 at 12:04 AM, Tamil selvan R.S tamil.3...@gmail.com
 wrote:

 Hi,
  Is there a tutorial or reference on CQL 3.0 Features. In cassandra
 download site the reference is still pointing to 2.0
  Specifically Composite Types
 Regards,
 Tamil.s






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: stream data using bulkoutputformat

2012-05-04 Thread Jonathan Ellis
We're working on this over at
https://issues.apache.org/jira/browse/CASSANDRA-4208

On Fri, May 4, 2012 at 4:56 PM, Shawna Qian shaw...@yahoo-inc.com wrote:
 Hi Group:

 I am following this great example to use bulkouputformat to streaming the
 data from hadoop to cassandra.
 http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hado
 op.html. It works perfectly when my keyspace has one cf.

 But in my case, I have 2 coulumn families defined in the keyspace that I
 want to stream the data to both of them at the same mapper.  Seems like
 the configHelper can only set one output column family. Is there a way
 that I can set multiple column families in one keyspace and output data to
 all the cfs?


 Thx
 Shawna




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: AssertionError: originally calculated column size ...

2012-04-30 Thread Jonathan Ellis
On Mon, Apr 30, 2012 at 2:11 PM, Patrik Modesto
patrik.mode...@gmail.com wrote:
 I think the problem is somehow connected to an IntegerType secondary
 index.

Could be, but my money is on the supercolumns in the HH data model.

Can you create a jira ticket?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: JNA + Cassandra security

2012-04-30 Thread Jonathan Ellis
On Mon, Apr 30, 2012 at 7:49 PM, Cord MacLeod cordmacl...@gmail.com wrote:
 Hello group,

 I'm a new Cassandra and Java user so I'm still trying to get my head around a 
 few things.  If you've disabled swap on a machine what is the reason to use 
 JNA?

Faster snapshots, giving hints to the page cache with fadvise.

  A second question is doesn't JNA break the Java inherent security mechanisms 
 by allowing access to direct system calls outside of the JVM?  Are there any 
 concerns around this?

We're not trying to sandbox anything here; there's lots of places
where we explicitly allow arbitrary Java code to be injected into
Cassandra.  You don't need native code to do dangerous things with
that!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: incremental_backups

2012-04-30 Thread Jonathan Ellis
Incremental snapshots contain only new data, so they are *much* smaller.

On Mon, Apr 30, 2012 at 12:39 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 I wonder what are the advantages of doing incremental snapshot over non
 incremental?
 Are the snapshots smaller is size? Are there any other implications?
 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
tokLogo.png

Re: Cql 3 wide rows filter expressions in where clause

2012-04-30 Thread Jonathan Ellis
That should work.  I don't see anything obviously wrong with your
query, other than the trivial (ascii values need to be quoted).
Assuming that's not the problem, please file a ticket if you have a
failing test case.

On Fri, Apr 20, 2012 at 11:59 PM, Nagaraj J nagaraj.pe...@gmail.com wrote:
 Hi

 cql 3 for wide rows is very promising. I was wondering if there is support
 for filtering wide rows by additional filter expressions in where clause
 (columns other than those which are part of the composite).

 Ex.
 suppose i have sparse cf

 create columnfamily scf( k ascii, o ascii, x ascii, y ascii, z ascii,
 PRIMARY KEY(k, o));

 is it possible to have a query

 select * from scf where k=1 and x=2 and z=2 order by o ASC;

 I tried this with 1.1-rc and it doesnt work as expected. Also looked at
 cql_tests.py in https://issues.apache.org/jira/browse/CASSANDRA-2474  there
 is no mention of this.

 Am i missing something here ?

 Thanks in advance
 Nagaraj

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cql-3-wide-rows-filter-expressions-in-where-clause-tp7486344p7486344.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-18 Thread Jonathan Ellis
It's not that simple, unless you have an append-only workload.  (See
discussion on
https://issues.apache.org/jira/browse/CASSANDRA-3974.)

On Wed, Apr 18, 2012 at 4:57 AM, Radim Kolar h...@filez.com wrote:

 Any compaction pass over A will first convert the TTL data into
 tombstones.

 Then, any subsequent pass that includes A *and all other sstables
 containing rows with the same key* will drop the tombstones.

 thats why i proposed to attach TTL to entire CF. Tombstones would not be
 needed



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Resident size growth

2012-04-18 Thread Jonathan Ellis
On Wed, Apr 18, 2012 at 12:44 PM, Rob Coli rc...@palominodb.com wrote:
 On Tue, Apr 10, 2012 at 8:40 AM, ruslan usifov ruslan.usi...@gmail.com 
 wrote:
 mmap doesn't depend on jna

 FWIW, this confusion is as a result of the use of *mlockall*, which is
 used to prevent mmapped files from being swapped, which does depend on
 JNA.

mlockall does depend on JNA, but we only lock the JVM itself in
memory.  The OS is free to page data files in and out as needed.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

2012-04-17 Thread Jonathan Ellis
64bit is recommended where that's available.

If you actually did have a 32bit machine or VM, then you should
dramatically reduce the commitlog space cap to the minimum of 128MB so
it doesn't need to mmap so much.

On Tue, Apr 17, 2012 at 1:45 PM, Bryce Godfrey
bryce.godf...@azaleos.com wrote:
 Sorry, I found the issue.  The server I was using had 32bit java installed.

 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Monday, April 16, 2012 11:39 PM
 To: user@cassandra.apache.org
 Subject: Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

 On Mon, Apr 16, 2012 at 10:45 PM, Bryce Godfrey bryce.godf...@azaleos.com 
 wrote:
 I keep running into this with my testing (on a windows box), Is this just a 
 OOM for RAM?

 How much RAM do you have? Do you use completely standard settings? Do you 
 also OOM if you try the same test with Cassandra 1.0.9?

 --
 Sylvain


 ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790
 AbstractCassandraDaemon.java (line 134) Exception in thread
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Map failed
        at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg
 ment.java:127)
        at
 org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(Commit
 LogSegment.java:80)
        at
 org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegmen
 t(CommitLogAllocator.java:244)
        at
 org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(Commit
 LogAllocator.java:49)
        at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(Com
 mitLogAllocator.java:104)
        at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30
 )
        at java.lang.Thread.run(Unknown Source) Caused by:
 java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(Unknown Source)
        at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg
 ment.java:119)
        ... 6 more
 Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map0(Native Method)
        ... 8 more
  INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961
 CassandraDaemon.java (line 218) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961
 MessagingService.java (line 539) Waiting for messaging service to
 quiesce
  INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java 
 (line 695) MessagingService shutting down server thread.

 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Friday, April 13, 2012 9:41 AM
 To: user@cassandra.apache.org
 Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

 The Cassandra team is pleased to announce the release of the first release 
 candidate for the future Apache Cassandra 1.1.

 Please first note that this is a release candidate, *not* the final release 
 yet.

 All help in testing this release candidate will be greatly appreciated. 
 Please report any problem you may encounter[3,4] and have a look at the 
 change log[1] and the release notes[2] to see where Cassandra 1.1 differs 
 from the previous series.

 Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra 
 website (http://cassandra.apache.org/download/) and a debian package is 
 available using the 11x branch (see 
 http://wiki.apache.org/cassandra/DebianPackaging).

 Thank you for your help in testing and have fun with it.

 [1]: http://goo.gl/XwH7J (CHANGES.txt)
 [2]: http://goo.gl/JocLX (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 [4]: user@cassandra.apache.org
 [5]:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=re
 fs/tags/cassandra-1.1.0-rc1



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Sat, Apr 14, 2012 at 3:27 AM, Radim Kolar h...@filez.com wrote:
 forceUserDefinedCompaction would be more usefull if you could do compaction
 on 2 tables.

You absolutely can.  That's what the user defined part is: you give
it the exact list of sstables you want compacted.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Sat, Apr 14, 2012 at 4:08 AM, Igor i...@4friends.od.ua wrote:
 Assume I insert all my data with TTL=2weeks and let we have sstable A which
 was created week ago at the time T, so I know that right now it contain:

 1) some data that were inserted not later than T and may-be not expired yet
 2) some amount of data that were already close to expiration due TTL at the
 time T, but still had no chances to be wiped out because up to the current
 moment size-tiered compaction did not involve A into compactions.

 Large amount of data from 2) became expired in a week after time T and
 probably passed gc_grace period, so it shoould be wiped at any compaction on
 table A.

Any compaction pass over A will first convert the TTL data into tombstones.

Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Off-heap row cache and mmapped sstables

2012-04-17 Thread Jonathan Ellis
Absolutely.  Best practice is still to disable swap entirely on server
machines; mlockall is just our best attempt to at least keep your JVM
from swapping if you've forgotten this.

On Thu, Apr 12, 2012 at 11:15 AM, Omid Aladini omidalad...@gmail.com wrote:
 Hi,

 Cassandra issues an mlockall [1] before mmap-ing sstables to prevent
 the kernel from paging out heap space in favor of memory-mapped
 sstables. I was wondering, what happens to the off-heap row cache
 (saved or unsaved)? Is it possible that the kernel pages out off-heap
 row cache in favor of resident mmap-ed sstable pages?

 Thanks,
 Omid

 [1] http://pubs.opengroup.org/onlinepubs/007908799/xsh/mlockall.html



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Long start-up times

2012-04-17 Thread Jonathan Ellis
On Sun, Apr 15, 2012 at 2:47 PM, sj.climber sj.clim...@gmail.com wrote:
 Also, I see in 1.0.9 there's a fix for a potentially related issue (see
 https://issues.apache.org/jira/browse/CASSANDRA-4023).  Any thoughts on
 this?

My thought is, upgrading is a no-brainer if that's a pain point for you. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: swap grows

2012-04-17 Thread Jonathan Ellis
Swappiness is actually a fairly weak hint to linux:

http://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness

On Sat, Apr 14, 2012 at 1:39 PM, aaron morton aa...@thelastpickle.com wrote:
 From https://help.ubuntu.com/community/SwapFaq
 
 swappiness=0 tells the kernel to avoid swapping processes out of physical
 memory for as long as possible
 

 If you have swap enabled at some point the OS may swap out pages, even if
 swappiness is 0 and you have free memory. Disable swap entirely if you want
 to avoid this.


 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/04/2012, at 1:37 AM, R. Verlangen wrote:

 Maybe it has got something to do with swapiness, it's something you can
 configure, more info here:
 https://www.linux.com/news/software/applications/8208-all-about-linux-swap-space

 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 I know:-) but this is not answer:-(. I found that on other nodes there
 still about 3GB (on node with JAVA_HEAP=6GB free memory also 3GB) of free
 memory but there JAVA_HEAP=5G, so this looks like some sysctl
 (/proc/sys/vm???) ratio (about 10%(3 / 24 * 100)), i don't known which,
 anybody can explain this situation

 2012/4/14 R. Verlangen ro...@us2.nl

 Its recommended to disable swap entirely when you run Cassandra on a
 server.


 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 I forgot to say that system have 24GB of phis memory


 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 Hello

 We have 6 node cluster (cassandra 0.8.10). On one node i increase java
 heap size to 6GB, and now at this node begin grows swap, but system have
 about 3GB of free memory:


 root@6wd003:~# free
  total   used   free shared    buffers
 cached
 Mem:  24733664   21702812    3030852  0   6792
 13794724
 -/+ buffers/cache:    7901296   16832368
 Swap:  1998840   2352    1996488


 And swap space slowly grows, but i misunderstand why?


 PS: We have JNA mlock, and set  vm.swappiness = 0
 PS: OS ubuntu 10.0.4(2.6.32-40-generic)






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: java.nio.BufferOverflowException from cassandra server

2012-04-17 Thread Jonathan Ellis
If I were to take a wild guess, it would be that you're using a single
Thrift connection in multiple threads, which isn't supported.

On Mon, Apr 16, 2012 at 6:43 PM, Aniket Chakrabarti
chakr...@cse.ohio-state.edu wrote:
 Hi,

 I have set up a 4 node cassandra cluster. I am using the Thrift C++ API to
 write a simple C++ application with creates a 50% READ 50% WRITE requests.
 Every time near about a thousand request mark, I am getting the following
 exception and my connection is broken:
 ===
 ERROR 17:30:27,647 Error occurred during processing of message.
 java.nio.BufferOverflowException
        at java.nio.charset.CoderResult.throwException(Unknown Source)
        at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
        at java.lang.StringCoding.encode(Unknown Source)
        at java.lang.String.getBytes(Unknown Source)
        at
 org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:185)
        at
 org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:92)
        at
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3302)
        at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
        at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
 ==
 Some info about the config I am using:
 - It is a 4 node cluster with only 1 seed.
 -The consistency level is also set to ONE.
 -The max heap size and new heap size is set to 4G and 800M(I tried without
 setting them as well)
 -Java is run in the interpreted mode(-Xint)
 -I'm using user mode linux

 Any pointers to what I might be doing wrong will be very helpful.

 Thanks in advance,
 Aniket



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-17 Thread Jonathan Ellis
On Tue, Apr 17, 2012 at 11:26 PM, Igor i...@4friends.od.ua wrote:
 You absolutely can.  That's what the user defined part is: you give
 it the exact list of sstables you want compacted.

 does it mean that I can use list (not just one) of sstables as second
 parameter for userDefinedCompaction?

If you want them all compacted together into one big sstable, yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Why so many SSTables?

2012-04-10 Thread Jonathan Ellis
LCS explicitly tries to keep sstables under 5MB to minimize extra work
done by compacting data that didn't really overlap across different
levels.

On Tue, Apr 10, 2012 at 9:24 AM, Romain HARDOUIN
romain.hardo...@urssaf.fr wrote:

 Hi,

 We are surprised by the number of files generated by Cassandra.
 Our cluster consists of 9 nodes and each node handles about 35 GB.
 We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
 We have 30 CF.

 We've got roughly 45,000 files under the keyspace directory on each node:
 ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
 44372

 The biggest CF is spread over 38,000 files:
 ls -l Documents* | wc -l
 37870

 ls -l Documents*-Data.db | wc -l
 7586

 Many SSTable are about 4 MB:

 19 MB - 1 SSTable
 12 MB - 2 SSTables
 11 MB - 2 SSTables
 9.2 MB - 1 SSTable
 7.0 MB to 7.9 MB - 6 SSTables
 6.0 MB to 6.4 MB - 6 SSTables
 5.0 MB to 5.4 MB - 4 SSTables
 4.0 MB to 4.7 MB - 7139 SSTables
 3.0 MB to 3.9 MB - 258 SSTables
 2.0 MB to 2.9 MB - 35 SSTables
 1.0 MB to 1.9 MB - 13 SSTables
 87 KB to  994 KB - 87 SSTables
 0 KB - 32 SSTables

 FYI here is CF information:

 ColumnFamily: Documents
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
   Row cache size / save period in seconds / keys to save : 0.0/0/all
   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
   Key cache size / save period in seconds: 20.0/14400
   GC grace seconds: 1728000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Column Metadata:
     Column Name: refUUID (7265664944)
       Validation Class: org.apache.cassandra.db.marshal.BytesType
       Index Name: refUUID_idx
       Index Type: KEYS
   Compaction Strategy:
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy
   Compression Options:
     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

 Is it a bug? If not, how can we tune Cassandra to avoid this?

 Regards,

 Romain



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Bulk loading errors with 1.0.8

2012-04-09 Thread Jonathan Ellis
On Thu, Apr 5, 2012 at 10:58 AM, Benoit Perroud ben...@noisette.ch wrote:
 ERROR [Thread-23] 2012-04-05 09:58:12,252 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[Thread-23,5,main]
 java.lang.RuntimeException: Insufficient disk space to flush
 7813594056494754913 bytes
        at 
 org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:635)
        at 
 org.apache.cassandra.streaming.StreamIn.getContextMapping(StreamIn.java:92)
        at 
 org.apache.cassandra.streaming.IncomingStreamReader.init(IncomingStreamReader.java:68)
        at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

 Here I'm not really sure I was able to generate 7 exa bytes of data ;)

The bulk loader told the Cassandra node, I have 7EB of data for you.
 And the C* node threw this error.  So you need to troubleshoot the
bulk loader side.

If you feel lucky, we've done some work on streaming in 1.1 to make it
more robust, but I don't recognize this specific problem so I can't
say for sure if 1.1 would help.

 ERROR [Thread-46] 2012-04-05 09:58:14,453 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[Thread-46,5,main]
 java.lang.NullPointerException
        at 
 org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156)
        at 
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334)
        at 
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
        at 
 org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:155)
        at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:89)
        at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

 This one sounds like a null key added to the SSTable at some point,
 but I'm rather confident I'm checking for key nullity.

The stacktrace indicates an error with the very first key in the
sstable, if that helps.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: leveled compaction - improve log message

2012-04-09 Thread Jonathan Ellis
CompactionExecutor doesn't have level information available to it; it
just compacts the sstables it's told to.  But if you enable debug
logging on LeveledManifest you'd see what you want.  (Compaction
candidates for L{} are {})

2012/4/5 Radim Kolar h...@filez.com:
 it would be really helpfull if leveled compaction prints level into syslog.

 Demo:

 INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java
 (line 113) Compacting ***LEVEL 1***
 [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')]

  INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java
 (line 221) *** LEVEL 1 *** Compacted to
 [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,].
  59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at
 1.814434MB/s.  Time: 30,256ms.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: size tiered compaction - improvement

2012-04-03 Thread Jonathan Ellis
Twitter tried a timestamp-based compaction strategy in
https://issues.apache.org/jira/browse/CASSANDRA-2735.  The conclusion
was, this actually resulted in a lot more compactions than the
SizeTieredCompactionStrategy. The increase in IO was not acceptable
for our use and therefore stopped working on this patch.

2012/4/3 Radim Kolar h...@filez.com:
 there is problem with size tiered compaction design. It compacts together
 tables of similar size.

 sometimes it might happen that you will have some sstables sitting on disk
 forever (Feb 23) because no other similar sized tables were created and
 probably never be. because flushed sstable is about 11-16 mb.

 next level about 90 MB
 then 5x 90 MB gets compacted to 400 MB sstable
 and 5x400 MB ~ 2 GB

 problem is that 400 MB sstable is too small to be compacted against these 3x
 720 MB ones.

 -rw-r--r--  1 root  wheel   165M Feb 23 17:03 resultcache-hc-13086-Data.db
 -rw-r--r--  1 root  wheel   772M Feb 23 17:04 resultcache-hc-13087-Data.db
 -rw-r--r--  1 root  wheel   156M Feb 23 17:06 resultcache-hc-13091-Data.db
 -rw-r--r--  1 root  wheel   716M Feb 23 17:18 resultcache-hc-13096-Data.db
 -rw-r--r--  1 root  wheel   734M Feb 23 17:29 resultcache-hc-13101-Data.db
 -rw-r--r--  1 root  wheel   5.0G Mar 14 09:38 resultcache-hc-13923-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 16 22:41 resultcache-hc-14084-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 21 15:11 resultcache-hc-14460-Data.db
 -rw-r--r--  1 root  wheel   1.9G Mar 27 05:22 resultcache-hc-14694-Data.db
 -rw-r--r--  1 root  wheel   2.0G Mar 31 04:57 resultcache-hc-14851-Data.db
 -rw-r--r--  1 root  wheel   112M Mar 31 06:30 resultcache-hc-14922-Data.db
 -rw-r--r--  1 root  wheel   577M Apr  1 19:25 resultcache-hc-14943-Data.db

 compaction strategy needs to compact sstables by timestamp too. older tables
 should have increased chance to get compacted.
 for example - table from today will be compacted with other table in range
 (0.5-1.5) of its size, and this range will get increased with sstable age. -
 1 month old will have range for example (0.2 - 1.8).



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Largest 'sensible' value

2012-04-03 Thread Jonathan Ellis
We use 2MB chunks for our CFS implementation of HDFS:
http://www.datastax.com/dev/blog/cassandra-file-system-design

On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter franc.car...@sirca.org.au wrote:

 Hi,

 We are in the early stages of thinking about a project that needs to store
 data that will be accessed by Hadoop. One of the concerns we have is around
 the Latency of HDFS as our use case is is not for reading all the data and
 hence we will need custom RecordReaders etc.

 I've seen a couple of comments that you shouldn't put large chunks in to a
 value - however 'large' is not well defined for the range of people using
 these solutions ;-)

 Doe anyone have a rough rule of thumb for how big a single value can be
 before we are outside sanity?

 thanks

 --

 Franc Carter | Systems architect | Sirca Ltd

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: column’s timestamp

2012-04-03 Thread Jonathan Ellis
That would work, with the caveat that you'd have to delete it and
re-insert if you want to preserve that relationship on update.

On Mon, Apr 2, 2012 at 12:18 PM, Pierre Chalamet pie...@chalamet.net wrote:
 Hi,

 What about using a ts as column name and do a get sliced instead ?


 --Original Message--
 From: Avi-h
 To: cassandra-u...@incubator.apache.org
 ReplyTo: user@cassandra.apache.org
 Subject: column’s timestamp
 Sent: Apr 2, 2012 18:24

 Is it possible to fetch a column based on the row key and the column’s
 timestamp only (not using the column’s name)?

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/column-s-timestamp-tp7429905p7429905.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.


 - Pierre



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: really bad select performance

2012-04-03 Thread Jonathan Ellis
Secondary indexes can generate a lot of random i/o.  iostat -x can
confirm if that's your problem.

On Thu, Mar 29, 2012 at 5:52 PM, Chris Hart ch...@remilon.com wrote:
 Hi,

 I have the following cluster:

 136112946768375385385349842972707284580
 ip address  MountainViewRAC1        Up     Normal  1.86 GB         20.00%  0
 ip address  MountainViewRAC1        Up     Normal  2.17 GB         33.33%  
 56713727820156410577229101238628035242
 ip address  MountainViewRAC1        Up     Normal  2.41 GB         33.33%  
 113427455640312821154458202477256070485
 ip address     Rackspace   RAC1        Up     Normal  3.9 GB          
 13.33%  136112946768375385385349842972707284580

 The following query runs quickly on all nodes except 1 MountainView node:

  select * from Access_Log where row_loaded = 0 limit 1;

 There is a secondary index on row_loaded.  The query usually doesn't complete 
 (but sometimes does) on the bad node and returns very quickly on all other 
 nodes.  I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 
 6) in the yaml, but it still often doesn't complete in a minute.  It 
 seems just as likely to complete and takes about the same amount of time 
 whether the limit is 1, 100 or 1000.


 Thanks for any help,
 Chris



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: tombstones problem with 1.0.8

2012-04-03 Thread Jonathan Ellis
Removing expired columns actually requires two compaction passes: one
to turn the expired column into a tombstone; one to remove the
tombstone after gc_grace_seconds. (See
https://issues.apache.org/jira/browse/CASSANDRA-1537.)

Perhaps CASSANDRA-2786 was causing things to (erroneously) be cleaned
up early enough that this helped you out in 0.8.2?

On Wed, Mar 21, 2012 at 8:38 PM, Ross Black ross.w.bl...@gmail.com wrote:
 Hi,

 We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have
 changed so that tombstones are now not being deleted.

 Our application continually adds and removes columns from Cassandra.  We
 have set a short gc_grace time (3600) since our application would
 automatically delete zombies if they appear.
 Under 0.8.2, the tombstones remained at a relatively constant number.
 Under 1.0.8, the tombstones have been continually increasing so that they
 exceed the size of our real data (at this stage we have over 100G of
 tombstones).
 Even after running a full compact the new compacted SSTable contains a
 massive number of tombstones, many that are several weeks old.

 Have I missed some new configuration option to allow deletion of tombstones?

 I also noticed that one of the changes between 0.8.2 and 1.0.8 was
 https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to
 avoid dropping tombstones when they might still be needed to shadow data in
 another sstable.
 Could this be having an impact since we continually add and remove columns
 even while a major compact is executing?


 Thanks,
 Ross




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cannot start cassandra node anymore

2012-03-27 Thread Jonathan Ellis
Hi Carlo,

Can you post steps to reproduce over on
https://issues.apache.org/jira/browse/CASSANDRA-3819 ?  We have tried
and failed to cause this problem.

On Thu, Jan 26, 2012 at 6:24 AM, Carlo Pires carlopi...@gmail.com wrote:
 I found out this is related to schema change. Happens *every time* I create
 drop and new CF with composite types. As workaround I:

 * never stop all nodes together

 To stop a node:
 * repair and compact a node before stopping it
 * stop and start it again
 * if it started fine good if not, remove all data and restart the node (and
 wait...)


 2012/1/25 aaron morton aa...@thelastpickle.com

 There is someone wrong with the way a composite type value was serialized.
 The length of a part on disk is not right.

 As a work around remove the log file, restart and then repair the node.

 How it got like that is another question. What was the schema change ?

 Cheers






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Final buffer length 4690 to accomodate data size of 2347 for RowMutation error caused node death

2012-03-07 Thread Jonathan Ellis
Thanks, Thomas.

Row cache/CLHCP confirms our suspected culprit.  We've committed a fix
for 1.0.9.

On Wed, Mar 7, 2012 at 11:08 AM, Thomas van Neerijnen
t...@bossastudios.com wrote:
 Sorry, for the delay in replying.

 I'd like to stress that I've been working on this cluster for many months
 and this was the first and so far last time I got this error so I couldn't
 guess how to duplicate. Sorry I can't be more help.

 Anyways, here's the details requested:
 Row caching is enabled, at the time the error occurred using
 ConcurrentLinkedHashCacheProvider.
 It's the Apache packaged version with JNA pulled in as a dependency when I
 installed so yes.
 We're using Hector 1.0.1.
 I'm not sure what was happening at the time the error occured altho the
 empty super columns are expected, assuming my understanding of super columns
 being deleted is correct, which is to say if I delete a super column from a
 row it'll tombstone it and delete the data.
 The schema for PlayerCity is as follows:

 create column family PlayerCity
   with column_type = 'Super'
   and comparator = 'UTF8Type'
   and subcomparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'BytesType'
   and rows_cached = 400.0
   and row_cache_save_period = 0
   and row_cache_keys_to_save = 2147483647
   and keys_cached = 20.0
   and key_cache_save_period = 14400
   and read_repair_chance = 1.0
   and gc_grace = 864000
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';


 On Fri, Feb 24, 2012 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I've filed https://issues.apache.org/jira/browse/CASSANDRA-3957 as a
 bug.  Any further light you can shed here would be useful.  (Is row
 cache enabled?  Is JNA installed?)

 On Mon, Feb 20, 2012 at 5:43 AM, Thomas van Neerijnen
 t...@bossastudios.com wrote:
  Hi all
 
  I am running the Apache packaged Cassandra 1.0.7 on Ubuntu 11.10.
  It has been running fine for over a month however I encountered the
  below
  error yesterday which almost immediately resulted in heap usage rising
  quickly to almost 100% and client requests timing out on the affected
  node.
  I gave up waiting for the init script to stop Cassandra and killed it
  myself
  after about 3 minutes, restarted it and it has been fine since. Anyone
  seen
  this before?
 
  Here is the error in the output.log:
 
  ERROR 10:51:44,282 Fatal exception in thread
  Thread[COMMIT-LOG-WRITER,5,main]
  java.lang.AssertionError: Final buffer length 4690 to accomodate data
  size
  of 2347 (predicted 2344) for RowMutation(keyspace='Player',
 
  key='36336138643338652d366162302d343334392d383466302d356166643863353133356465',
  modifications=[ColumnFamily(PlayerCity [SuperColumn(owneditem_1019
  []),SuperColumn(owneditem_1024 []),SuperColumn(owneditem_1026
  []),SuperColumn(owneditem_1074 []),SuperColumn(owneditem_1077
  []),SuperColumn(owneditem_1084 []),SuperColumn(owneditem_1094
  []),SuperColumn(owneditem_1130 []),SuperColumn(owneditem_1136
  []),SuperColumn(owneditem_1141 []),SuperColumn(owneditem_1142
  []),SuperColumn(owneditem_1145 []),SuperColumn(owneditem_1218
 
  [636f6e6e6563746564:false:5@1329648704269002,63757272656e744865616c7468:false:3@1329648704269006,656e64436f6e737472756374696f6e54696d65:false:13@1329648704269007,6964:false:4@1329648704269000,6974656d4964:false:15@1329648704269001,6c61737444657374726f79656454696d65:false:1@1329648704269008,6c61737454696d65436f6c6c6563746564:false:13@1329648704269005,736b696e4964:false:7@1329648704269009,78:false:4@1329648704269003,79:false:3@1329648704269004,]),SuperColumn(owneditem_133
  []),SuperColumn(owneditem_134 []),SuperColumn(owneditem_135
  []),SuperColumn(owneditem_141 []),SuperColumn(owneditem_147
  []),SuperColumn(owneditem_154 []),SuperColumn(owneditem_159
  []),SuperColumn(owneditem_171 []),SuperColumn(owneditem_253
  []),SuperColumn(owneditem_422 []),SuperColumn(owneditem_438
  []),SuperColumn(owneditem_515 []),SuperColumn(owneditem_521
  []),SuperColumn(owneditem_523 []),SuperColumn(owneditem_525
  []),SuperColumn(owneditem_562 []),SuperColumn(owneditem_61
  []),SuperColumn(owneditem_634 []),SuperColumn(owneditem_636
  []),SuperColumn(owneditem_71 []),SuperColumn(owneditem_712
  []),SuperColumn(owneditem_720 []),SuperColumn(owneditem_728
  []),SuperColumn(owneditem_787 []),SuperColumn(owneditem_797
  []),SuperColumn(owneditem_798 []),SuperColumn(owneditem_838
  []),SuperColumn(owneditem_842 []),SuperColumn(owneditem_847
  []),SuperColumn(owneditem_849 []),SuperColumn(owneditem_851
  []),SuperColumn(owneditem_852 []),SuperColumn(owneditem_853
  []),SuperColumn(owneditem_854 []),SuperColumn(owneditem_857
  []),SuperColumn(owneditem_858 []),SuperColumn(owneditem_874
  []),SuperColumn(owneditem_884 []),SuperColumn(owneditem_886

  1   2   3   4   5   6   7   8   9   10   >