Re: Regarding JIRA

2015-06-01 Thread Dave Brosius
 

JIra should be left for issues that you have some confidence are bugs in
cassandra or items you want as feature requests. 

For general questions, try the cassandra mailing lists
user@cassandra.apache.org to subscribe -
user-subscr...@cassandra.apache.org 

or use irc #cassandra on freenode 

On 2015-06-01 15:31, Kiran mk wrote: 

 Hi , 
 
 I am using Apache Cassandra Community Edition for my learning and practicing, 
 can I raise the doubts,issues and clarifications using JIRA ticket against 
 Cassandra. 
 
 Will there be any charges for that. As I know we can create free JIRA 
 account, 
 
 Can anyone suggest me on this.
 
 -- 
 
 Best Regards,
 Kiran.M.K.
 

Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper

2015-04-02 Thread Dave Brosius

This is what i meant by 'initial cause'

Caused by: java.lang.ClassNotFoundException: 
com.datastax.spark.connector.mapper.ColumnMapper


So it is in fact a classpath problem

Here is the class in question 
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/mapper/ColumnMapper.scala


Maybe it would be worthwhile to put this at the top of your main method

System.out.println(System.getProperty(java.class.path);

and show what that prints.

What version of the cassandra and what version of the cassandra-spark 
connector are you using, btw?






On 04/02/2015 11:16 PM, Tiwari, Tarun wrote:


Sorry I was unable to reply for couple of days.

I checked the error again and can’t see any other initial cause. Here 
is the full error that is coming.


Exception in thread main java.lang.NoClassDefFoundError: 
com/datastax/spark/connector/mapper/ColumnMapper


at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

*Caused by: java.lang.ClassNotFoundException: 
com.datastax.spark.connector.mapper.ColumnMapper*


at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

*From:*Dave Brosius [mailto:dbros...@mebigfatguy.com]
*Sent:* Tuesday, March 31, 2015 8:46 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Getting NoClassDefFoundError for 
com/datastax/spark/connector/mapper/ColumnMapper


Is there an 'initial cause' listed under that exception you gave? As 
NoClassDefFoundError is not exactly the same as 
ClassNotFoundException. It meant that ColumnMapper couldn't initialize 
it's static initializer, it could be because some other class couldn't 
be found, or it could be some other non classloader related error.


  


On 2015-03-31 10:42, Tiwari, Tarun wrote:

Hi Experts,

I am getting java.lang.NoClassDefFoundError:
com/datastax/spark/connector/mapper/ColumnMapper while running a
app to load data to Cassandra table using the datastax spark connector

Is there something else I need to import in the program or
dependencies?

*RUNTIME ERROR:*  Exception in thread main
java.lang.NoClassDefFoundError:
com/datastax/spark/connector/mapper/ColumnMapper

at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

*Below is my scala program*

/*** ld_Cassandra_Table.scala ***/

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

import com.datastax.spark.connector

import com.datastax.spark.connector._

object ldCassandraTable {

def main(args: Array[String]) {

val fileName = args(0)

val tblName = args(1)

val conf = new
SparkConf(true).set(spark.cassandra.connection.host, MASTER
HOST) .setMaster(MASTER URL)
.setAppName(LoadCassandraTableApp)

val sc = new SparkContext(conf)


sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar)

val normalfill = sc.textFile(fileName).map(line = line.split('|'))

normalfill.map(line = (line(0), line(1), line(2), line(3),
line(4), line(5), line(6), line(7), line(8), line(9), line(10),
line(11), line(12), line(13), line(14), line(15), line(16),
line(17), line(18), line(19), line(20),
line(21))).saveToCassandra(keyspace, tblName,
SomeColumns(wfctotalid, timesheetitemid, employeeid,
durationsecsqty, wageamt, moneyamt, applydtm,
laboracctid, paycodeid, startdtm, stimezoneid,
adjstartdtm, adjapplydtm, enddtm, homeaccountsw,
notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty,
updatedtm, totaledversion, acctapprovalnum))

println(Records Loaded to .format(tblName))

Thread.sleep(500)

sc.stop()

}

}

*Below is the sbt file:*

name:= “POC”

version := 0.0.1

scalaVersion := 2.10.4

// additional libraries

libraryDependencies ++= Seq(

org.apache.spark %% spark-core % 1.1.1 % provided,

org.apache.spark %% spark-sql % 1.1.1 % provided,

com.datastax.spark %% spark-cassandra-connector % 1.1.1

Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper

2015-03-31 Thread Dave Brosius
 

Is there an 'initial cause' listed under that exception you gave? As
NoClassDefFoundError is not exactly the same as ClassNotFoundException.
It meant that ColumnMapper couldn't initialize it's static initializer,
it could be because some other class couldn't be found, or it could be
some other non classloader related error. 

On 2015-03-31 10:42, Tiwari, Tarun wrote: 

 Hi Experts, 
 
 I am getting java.lang.NoClassDefFoundError: 
 com/datastax/spark/connector/mapper/ColumnMapper while running a app to load 
 data to Cassandra table using the datastax spark connector 
 
 Is there something else I need to import in the program or dependencies? 
 
 RUNTIME ERROR: Exception in thread main java.lang.NoClassDefFoundError: 
 com/datastax/spark/connector/mapper/ColumnMapper 
 
 at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) 
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 
 BELOW IS MY SCALA PROGRAM 
 
 /*** ld_Cassandra_Table.scala ***/ 
 
 import org.apache.spark.SparkContext 
 
 import org.apache.spark.SparkContext._ 
 
 import org.apache.spark.SparkConf 
 
 import com.datastax.spark.connector 
 
 import com.datastax.spark.connector._ 
 
 object ldCassandraTable { 
 
 def main(args: Array[String]) { 
 
 val fileName = args(0) 
 
 val tblName = args(1) 
 
 val conf = new SparkConf(true).set(spark.cassandra.connection.host, 
 MASTER HOST) .setMaster(MASTER URL) 
 .setAppName(LoadCassandraTableApp) 
 
 val sc = new SparkContext(conf) 
 
 sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar)
  
 
 val normalfill = sc.textFile(fileName).map(line = line.split('|')) 
 
 normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), 
 line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), 
 line(14), line(15), line(16), line(17), line(18), line(19), line(20), 
 line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, 
 timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, 
 applydtm, laboracctid, paycodeid, startdtm, stimezoneid, 
 adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, 
 wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, 
 totaledversion, acctapprovalnum)) 
 
 println(Records Loaded to .format(tblName)) 
 
 Thread.sleep(500) 
 
 sc.stop() 
 
 } 
 
 } 
 
 BELOW IS THE SBT FILE: 
 
 name:= POC 
 
 version := 0.0.1 
 
 scalaVersion := 2.10.4 
 
 // additional libraries 
 
 libraryDependencies ++= Seq( 
 
 org.apache.spark %% spark-core % 1.1.1 % provided, 
 
 org.apache.spark %% spark-sql % 1.1.1 % provided, 
 
 com.datastax.spark %% spark-cassandra-connector % 1.1.1 % provided 
 
 ) 
 
 Regards, 
 
 TARUN TIWARI | Workforce Analytics-ETL | KRONOS INDIA 
 
 M: +91 9540 28 27 77 | Tel: +91 120 4015200 
 
 Kronos | Time  Attendance * Scheduling * Absence Management * HR  Payroll * 
 Hiring * Labor Analytics 
 
 JOIN KRONOS ON: KRONOS.COM [1] | FACEBOOK [2]|TWITTER [3]|LINKEDIN [4] 
 |YOUTUBE [5]
 

Links:
--
[1] http://www.kronos.com/
[2] http://www.kronos.com/facebook
[3] http://www.kronos.com/twitter
[4] http://www.kronos.com/linkedin
[5] http://www.kronos.com/youtube


Re: Storing bi-temporal data in Cassandra

2015-02-15 Thread Dave Brosius
As you point out, there's not really a node-based problem with your 
query from a performance point of view. This is a limitation of CQL in 
that, cql wants to slice one section of a partition's row (no matter how 
big the section is). In your case, you are asking to slice multiple 
sections of a partition's row, which currently isn't supported.


It seems silly perhaps that this is the case, as certainly in your 
example it would seem useful, and not to difficult, but the problem is 
that you can wind up with n-depth slicing of that partitioned row given 
an arbitrary query syntax if range queries on clustering keys was 
allowed anywhere.


At present, you can either duplicate the data using the other clustering 
key (transaction_time) as primary clusterer for this use case, or omit 
the 3rd criterion (transaction_time = '')in the query and get all 
the range query results and filter on the client.


hth,
dave


On 02/14/2015 06:05 PM, Raj N wrote:
I don't think thats solves my problem. The question really is why 
can't we use ranges for both time columns when they are part of the 
primary key. They are on 1 row after all. Is this just a CQL limitation?


-Raj

On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan doanduy...@gmail.com 
mailto:doanduy...@gmail.com wrote:


I am trying to get the state as of a particular transaction_time

 -- In that case you should probably define your primary key in
another order for clustering columns

PRIMARY KEY (weatherstation_id,transaction_time,event_time)

Then, select * from temperatures where weatherstation_id = 'foo'
and event_time = '2015-01-01 00:00:00' and event_time 
'2015-01-02 00:00:00' and transaction_time = ''



On Sat, Feb 14, 2015 at 3:06 AM, Raj N raj.cassan...@gmail.com
mailto:raj.cassan...@gmail.com wrote:

Has anyone designed a bi-temporal table in Cassandra? Doesn't
look like I can do this using CQL for now. Taking the time
series example from well known modeling tutorials in Cassandra -

CREATE TABLE temperatures (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time),
) WITH CLUSTERING ORDER BY (event_time DESC);

If I add another column transaction_time

CREATE TABLE temperatures (
weatherstation_id text,
event_time timestamp,
transaction_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time,transaction_time),
) WITH CLUSTERING ORDER BY (event_time DESC, transaction_time
DESC);

If I try to run a query using the following CQL, it throws an
error -

select * from temperatures where weatherstation_id = 'foo' and
event_time = '2015-01-01 00:00:00' and event_time 
'2015-01-02 00:00:00' and transaction_time  '2015-01-02 00:00:00'

It works if I use an equals clause for the event_time. I am
trying to get the state as of a particular transaction_time

-Raj







Re: Cassandra 2.1.2, Pig 0.14, Hadoop 2.6.0 does not work together

2015-01-22 Thread Dave Brosius

The method

com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;

should be available in guava from 15.0 on. So guava-16.0 should be fine.

It's possible guava is being picked up from somewhere else? have a 
global classpath variable?


you might want to do

URL u = YourClass.getResource(/com/google/common/collect/Sets.class);
System.out.println(u);

to see where you are loading guava from.


On 01/22/2015 04:12 AM, Pinak Pani wrote:
I am using Pig with Cassandra (Cassandra 2.1.2, Pig 0.14, Hadoop 2.6.0 
combo).


When I use CqlStorage() I get

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
org.apache.cassandra.exceptions.ConfigurationException: Unable to find 
inputformat class 'org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat/


When I use CqlNativeStorage() I get

java.lang.NoSuchMethodError: 
com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;


Pig classpath looks like this:

» echo $PIG_CLASSPATH

/home/naishe/apps/apache-cassandra-2.1.2/lib/airline-0.6.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/antlr-runtime-3.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-clientutil-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-thrift-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-cli-1.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-codec-1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-lang3-3.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-math3-3.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/compress-lzf-0.8.4.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/concurrentlinkedhashmap-lru-1.4.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/disruptor-3.0.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/*guava-16.0.jar*:/home/naishe/apps/apache-cassandra-2.1.2/lib/high-scale-lib-1.0.6.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jackson-core-asl-1.9.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jackson-mapper-asl-1.9.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jamm-0.2.8.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/javax.inject.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jbcrypt-0.3m.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jline-1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jna-4.0.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/json-simple-1.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/libthrift-0.9.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/logback-classic-1.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/logback-core-1.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/lz4-1.2.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/metrics-core-2.2.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/netty-all-4.0.23.Final.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/reporter-config-2.1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/slf4j-api-1.7.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/snakeyaml-1.11.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/snappy-java-1.0.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/stream-2.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/stringtemplate-4.0.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/super-csv-2.1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/thrift-server-0.3.7.jar::/home/naishe/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.2/cassandra-driver-core-2.1.2.jar:/home/naishe/.m2/repository/org/apache/cassandra/cassandra-all/2.1.2/cassandra-all-2.1.2.jar

I have read somewhere that it is due to version conflict with Guava 
library. So, I tried using Guava 11.0.2, that did not help. 
(http://stackoverflow.com/questions/27089126/nosuchmethoderror-sets-newconcurrenthashset-while-running-jar-using-hadoop#comment42687234_27089126)


Here is the Pig latin that I was trying to execute.

grunt alice = LOAD 'cql://hadoop_test/lines' USING CqlNativeStorage();
2015-01-22 09:28:54,133 [main] INFO 
 org.apache.hadoop.conf.Configuration.deprecation - fs.default.name 
http://fs.default.name is deprecated. Instead, use fs.defaultFS
grunt B = foreach alice generate flatten(TOKENIZE((chararray)$0)) as 
word;

grunt C = group B by word;
grunt D = foreach C generate COUNT(B) as word_count, group as word;
grunt dump D;
2015-01-22 09:29:06,808 [main] INFO 
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the 
script: GROUP_BY

[ -- snip -- ]
2015-01-22 09:29:11,254 [LocalJobRunner Map Task Executor #0] INFO 
 org.apache.hadoop.mapred.MapTask - Map output collector class = 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-01-22 09:29:11,588 [LocalJobRunner Map Task Executor #0] INFO 
 org.apache.hadoop.mapred.MapTask - Starting flush of map output
2015-01-22 09:29:11,600 [Thread-22] INFO 
 org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-01-22 09:29:11,620 [Thread-22] WARN 
 

Re: Cassandra Wiki Immutable?

2014-08-19 Thread Dave Brosius

added, thanks.

On 08/18/2014 06:15 AM, Otis Gospodnetic wrote:

Hi,

What is the state of Cassandra Wiki -- http://wiki.apache.org/cassandra ?

I tried to update a few pages, but it looks like pages are immutable. 
 Do I need to have my Wiki username (OtisGospodnetic) added to some ACL?


Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/




Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Dave Brosius
 

We had a massive spam problem before we locked down the wiki, so
unfortunately that was the choice we had to make. But as stated we can
add you to the contributers list. 

What is your Wiki user name? 

On 2014-07-23 07:33, Peter Lin wrote: 

 I've tried to contribute docs to Cassandra wiki in the past, but there's an 
 obstacle.
 
 currently wiki.apache.org/cassandra [1] is locked down, so only commiters can 
 edit it. I really wish that wasn't the case, since it wastes time. the 
 commiters are busy writing code. Having to email a commiter and ask them to 
 update it feels silly to me and kind of goes against openness. Back when I 
 was active with JMeter, we decided to leave it open so that anyone can edit 
 the docs.
 
 I can't be the only one that wants to help make the docs better, but get 
 frustrated with the wiki being closed.
 
 On Wed, Jul 23, 2014 at 4:25 AM, spa...@gmail.com wrote:
 
 I would like to help out with the documentation of C*. How do I start? 
 
 On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp sn...@snazy.de wrote:
 
 Just a note: 
 If you have suggestions how to improve documentation on the datastax website, 
 write them an email to d...@datastax.com. They appreciate proposals :) 
 
 Am 23.07.2014 um 09:10 schrieb Mark Reddy mark.re...@boxever.com: 
 
 Hi Kevin, 
 The difference here is that the Apache Cassandra site is maintained by the 
 community whereas the DataStax site is maintained by paid employees with a 
 vested interest in producing documentation. 
 
 With DataStax having some comprehensive docs, I guess the desire for people 
 to maintain the Apache site has dwindled. However, if you are interested in 
 contributing to it and bringing it back up to standard you can, thus is the 
 freedom of open source. 
 
 Mark 
 
 On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton bur...@spinn3r.com wrote:
 
 This document: 
 
 https://wiki.apache.org/cassandra/Operations [2] 
 
 … for example. Is extremely out dated… does NOT reflect 2.x releases 
 certainly. Mentions commands that are long since removed/deprecated. 
 
 Instead of giving bad documentation, maybe remove this and mark it as 
 obsolete. 
 The datastax documentation… is … acceptable I guess. My main criticism there 
 is that a lot of it it is in their blog. 
 
 Kevin
 
 -- 
 
 Founder/CEO Spinn3r.com [3] 
 Location: SAN FRANCISCO, CA 
 blog: http://burtonator.wordpress.com [4] 
 … or check out my Google+ profile [5] 
 [3]

 -- 
http://spawgi.wordpress.com [6]
 We can do it and do it better. 

Links:
--
[1] http://wiki.apache.org/cassandra
[2] https://wiki.apache.org/cassandra/Operations
[3] http://spinn3r.com/
[4] http://burtonator.wordpress.com/
[5] https://plus.google.com/102718274791889610666/posts
[6] http://spawgi.wordpress.com


Re: What % of cassandra developers are employed by Datastax?

2014-05-17 Thread Dave Brosius
The question assumes that it's likely that datastax employees become 
committers.


Actually, it's more likely that committers become datastax employees.

So this underlying tone that datastax only really 'wants' datastax 
employees to be cassandra committers, is really misleading.


Why wouldn't a company want to hire people who have shown a desire and 
aptitude to work on products that they care about? It's just rational. 
And damn genius, actually.


I'm sure they'd be happy to have an influx of non-datastax committers. 
patches welcome.


dave


On 05/17/2014 08:28 AM, Peter Lin wrote:


if you look at the new committers since 2012 they are mostly datastax


On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com 
mailto:bur...@spinn3r.com wrote:


so 30%… according to that data.


On Thu, May 15, 2014 at 4:59 PM, Michael Shuler
mich...@pbandjelly.org mailto:mich...@pbandjelly.org wrote:

On 05/14/2014 03:39 PM, Kevin Burton wrote:

I'm curious what % of cassandra developers are employed by
Datastax?


http://wiki.apache.org/cassandra/Committers

-- 
Kind regards,

Michael




-- 
Founder/CEO Spinn3r.com http://Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*
blog:**http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength.
Corporations are people.






Re: initial token crashes cassandra

2014-05-17 Thread Dave Brosius
What Colin is saying is that the tool you used to create the token, is 
not creating tokens usable for the Murmur3Partitioner. That tool is 
probably generating tokens for the (original) RandomPartitioner, which 
has a different range.



On 05/17/2014 07:20 PM, Tim Dunphy wrote:

Hi and thanks for your response.

The puzzling thing is that yes I am using the murmur partition, yet I 
am still getting the error I just told you guys about:


[root@beta:/etc/alternatives/cassandrahome] #grep -i partition 
conf/cassandra.yaml | grep -v '#'

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Thanks
Tim


On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com 
mailto:colpcl...@gmail.com wrote:


You may have used the old random partitioner token generator.  Use
the murmur partitioner token generator instead.

-- 
Colin

320-221-9531 tel:320-221-9531


On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com
mailto:bluethu...@gmail.com wrote:


Hey all,

 I've set my initial_token in cassandra 2.0.7 using a python
script I found at the datastax wiki.

I've set the value like this:

initial_token: 85070591730234615865843651857942052864

And cassandra crashes when I try to start it:

[root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f
 INFO 18:14:38,511 Logging initialized
 INFO 18:14:38,560 Loading settings from
file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml
 INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data]
 INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog
 INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap,
indexAccessMode is mmap
 INFO 18:14:39,153 disk_failure_policy is stop
 INFO 18:14:39,153 commit_failure_policy is stop
 INFO 18:14:39,161 Global memtable threshold is enabled at 251MB
 INFO 18:14:39,362 Not using multi-threaded compaction
ERROR 18:14:39,365 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: For input
string: 85070591730234615865843651857942052864
at

org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178)
at

org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440)
at

org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153)
at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560)
For input string: 85070591730234615865843651857942052864
Fatal configuration error; unable to start. See log for stacktrace.

I really need to get replication going between 2 nodes. Can
someone clue me into why this may be crashing?

Thanks!
Tim

-- 
GPG me!!


gpg --keyserver pool.sks-keyservers.net
http://pool.sks-keyservers.net --recv-keys F186197B





--
GPG me!!

gpg --keyserver pool.sks-keyservers.net 
http://pool.sks-keyservers.net --recv-keys F186197B






Re: Failed to mkdirs $HOME/.cassandra

2014-05-16 Thread Dave Brosius
 

For now you can edit the nodetool script itself by adding 

-Duser.home=/tmp 

as in 

$JAVA $JAVA_AGENT -cp $CLASSPATH 
 -Xmx32m 
 -Duser.home=/tmp 
 -Dlogback.configurationFile=logback-tools.xml 
 -Dstorage-config=$CASSANDRA_CONF 
 org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS

if you like you can add an issue to jira. 

On 2014-05-09 18:42, Bryan Talbot wrote: 

 How should nodetool command be run as the user nobody? 
 
 The nodetool command fails with an exception if it cannot create a .cassandra 
 directory in the current user's home directory. 
 
 I'd like to schedule some nodetool commands to run with least privilege as 
 cron jobs. I'd like to run them as the nobody user -- which typically has 
 / as the home directory -- since that's what the user is typically used for 
 (minimum privileges). 
 
 None of the methods described in this JIRA actually seem to work (with 2.0.7 
 anyway) https://issues.apache.org/jira/browse/CASSANDRA-6475 [1] 
 
 Testing as a normal user with no write permissions to the home directory (to 
 simulate the nobody user) 
 
 [vagrant@local-dev ~]$ nodetool version 
 ReleaseVersion: 2.0.7 
 [vagrant@local-dev ~]$ rm -rf .cassandra/ 
 [vagrant@local-dev ~]$ chmod a-w . 
 
 [vagrant@local-dev ~]$ nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ HOME=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ env HOME=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ env user.home=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ nodetool -Duser.home=/tmp flush my_ks my_cf 
 Unrecognized option: -Duser.home=/tmp 
 usage: java org.apache.cassandra.tools.NodeCmd --host arg command 
 ...
 

Links:
--
[1] https://issues.apache.org/jira/browse/CASSANDRA-6475


Re: java.lang.StackOverflowError with big IN list

2014-01-10 Thread Dave Brosius
In the mean time you can try upping the value of your -Xss setting in 
cassandra-env.sh to see if just a little push will take the problem away.


On 01/10/2014 10:18 AM, Дмитрий Шохов wrote:

https://issues.apache.org/jira/browse/CASSANDRA-6567

Thank you!


2014/1/10 Benedict Elliott Smith belliottsm...@datastax.com 
mailto:belliottsm...@datastax.com


It must be a very large IN clause, which is probably not
advisable. But it shouldn't cause this error, and since it's an
easy fix to prevent it, if you file a JIRA I'll post a patch.


On 10 January 2014 13:08, Дмитрий Шохов sho...@gmail.com
mailto:sho...@gmail.com wrote:

Hello I'm getting stack overflow when running prepared queries
with IN parameter and binding big list in it. Is it known
limitation and I must implement manual paging or change logic
to get around this, or is it some bug maybe...

java.lang.StackOverflowError
at

org.apache.cassandra.utils.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:110)
at

org.apache.cassandra.utils.FastByteComparisons.compareTo(FastByteComparisons.java:41)
at

org.apache.cassandra.utils.FBUtilities.compareUnsigned(FBUtilities.java:216)
at

org.apache.cassandra.utils.ByteBufferUtil.compareUnsigned(ByteBufferUtil.java:89)
at
org.apache.cassandra.db.marshal.LongType.compareLongs(LongType.java:54)
at
org.apache.cassandra.db.marshal.LongType.compare(LongType.java:36)
at
org.apache.cassandra.db.marshal.LongType.compare(LongType.java:28)
at

org.apache.cassandra.db.ArrayBackedSortedColumns.binarySearch(ArrayBackedSortedColumns.java:170)
at

org.apache.cassandra.db.ArrayBackedSortedColumns.binarySearch(ArrayBackedSortedColumns.java:152)
at

org.apache.cassandra.db.ArrayBackedSortedColumns.getColumn(ArrayBackedSortedColumns.java:89)
at

org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:825)
at

org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826)
at

org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826)
at

org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826)
at  many more same line stack elements

Cassandra 2.0.4 Java driver 2.0 rc2







Re: unsubscribe

2014-01-09 Thread Dave Brosius
just send that email to user-unsubscribe@cassandra.apache.orgif still confused 
check here http://hadonejob.com/img/full/12598654.jpg   - Original Message 
-From: quot;Earl Rubyquot; ;er...@webcdr.com

Re: Why was Thrift defined obsolete?

2013-12-17 Thread Dave Brosius
Realize that there will be more and more new features that come along as 
cassandra matures. It is an overwhelming certainty that these feature will be 
available thru the new native interface amp; CQL. The same level of certainty 
can't be given to Thrift. Certainly if you have existing applications running 
against Thrift, then there is no need to worry that Thrift will break or not 
perform optimally in the future. But going forward, there will be things that 
you won't be able to use thru Thrift that may solve problems for you. If you 
are starting now, the recommendation is to use the new native interface and 
CQL.Just saying...   - Original Message -From: quot;Peter Linquot; 
;wool...@gmail.com 

Re: Parse xml and store data in Map using xom parser

2013-12-08 Thread Dave Brosius
Not really a cassandra question, but it would seem your xml file isn't 
particularly well designed. It would seem you need to qualify your 
test entries with indices when put in the map, such as


put(test.1.C, 0);

put(test.2.C, 50);

before figuring out the cassandra angle, i'd rethink how that xml is 
designed if that's within your control.




On 12/08/2013 10:10 AM, Santosh Shet wrote:


Hi,

I am trying to parse below shown XML file using *xom parser in java* 
and put each key,value pairs into Map. Later I am trying to insert 
this Map into Cassandra using Mutator object. My XML file looks like 
as below:


sample

Max0.25000/Max

test

APercentage/A

B1/B

C0/C

D20/D

E0.25/E

/test

test

APercentage/A

B1/B

C50/C

D75/D

E0.15/E

/test

/sample

Currently I have HashMapString,String xmlData to hold elements of XML.

I am traversing each child elements using getChildElements() and then 
retrieving element name ,element value and storing them inside 
HashMap. But I am facing problem while doing for the second subchild 
element (*testmarked in green*) because it overwrites values of 
child elements of which I had traversed in my last iteration (*test 
marked in blue*) .


Could somebody provide thoughts on how to store data in Cassandra in 
above situation. Is there any better way to it or do I need to append 
counter+xpath. For example, if there are 2 child elements, append 
test1+1+A for element A inside first test and append 
test+2+A for subchild of another test element.


Thanks in advance.

Best,

*Santosh Shet*

Software Engineer | VistaOne Solutions

Direct India : *+91 80 30273829* | Mobile India : *+91 8105720582*

Skype : santushet





Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided

2013-12-07 Thread Dave Brosius
BoundStatement query = prBatchInsert.bind(userId, 
attributes.values().toArray(new *String*[attributes.size()]))




On 12/07/2013 03:59 PM, Techy Teck wrote:
I am trying to insert into Cassandra database using Datastax Java 
driver. But everytime I am getting below exception at 
`prBatchInsert.bind` line-


com.datastax.driver.core.exceptions.InvalidTypeException: Invalid 
type for value 1 of CQL type text, expecting class java.lang.String 
but class [Ljava.lang.Object; provided


Below is my method which accepts `userId` as the input and 
`attributes` as the `Map` which contains `key` as my `Column Name` and 
value as the actual value of that column


public void upsertAttributes(final String userId, final MapString, 
String attributes, final String columnFamily) {


try {
SetString keys = attributes.keySet();
StringBuilder sqlPart1 = new StringBuilder(); 
//StringBuilder.append() is faster than concatenating Strings in a loop

StringBuilder sqlPart2 = new StringBuilder();

sqlPart1.append(INSERT INTO  + columnFamily + (USER_ID );
sqlPart2.append() VALUES ( ?);

for (String k : keys) {
sqlPart1.append(, +k); //append each key
sqlPart2.append(, ?);  //append an unknown value for 
each key

}
sqlPart2.append() ); //Last parenthesis (and space?)
String sql = sqlPart1.toString()+sqlPart2.toString();

CassandraDatastaxConnection.getInstance();
PreparedStatement prBatchInsert = 
CassandraDatastaxConnection.getSession().prepare(sql);

prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE);

// this line is giving me an exception
BoundStatement query = prBatchInsert.bind(userId, 
attributes.values().toArray(new Object[attributes.size()])); //Vararg 
methods can take an array (might need to cast it to String[]?).


CassandraDatastaxConnection.getSession().executeAsync(query);

} catch (InvalidQueryException e) {
LOG.error(Invalid Query Exception in 
CassandraDatastaxClient::upsertAttributes +e);

} catch (Exception e) {
LOG.error(Exception in 
CassandraDatastaxClient::upsertAttributes +e);

}
}


What wrong I am doing here? Any thoughts?




Re: unsubscribe

2013-10-30 Thread Dave Brosius

Please send that same riveting text to user-unsubscr...@cassandra.apache.org


*http://tinyurl.com/kdrwyrc*


On 10/30/2013 02:49 PM, Leonid Ilyevsky wrote:

Unsubscribe

This email, along with any attachments, is confidential and may be legally privileged or 
otherwise protected from disclosure. Any unauthorized dissemination, copying or use of 
the contents of this email is strictly prohibited and may be in violation of law. If you 
are not the intended recipient, any disclosure, copying, forwarding or distribution of 
this email is strictly prohibited and this email and any attachments should be deleted 
immediately.  This email and any attachments do not constitute an offer to sell or a 
solicitation of an offer to purchase any interest in any investment vehicle sponsored by 
Moon Capital Management LP (Moon Capital). Moon Capital does not provide 
legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters 
was not intended or written to be relied upon by any person as advice. Moon Capital does 
not waive confidentiality or privilege as a result of this email.





Re: Writing same key on two nodes using ONE consistency

2013-10-27 Thread Dave Brosius
each node would forward the write request to the node responsible to 
hold that key (determined by the hash function)


On 10/26/2013 09:25 PM, Mohammad Hajjat wrote:

Hi,

Quick question about Cassandra.
If I write the same key (with two different values) to two different 
nodes with consistency of ONE. Assuming 'SimpleStrategy' and no 
replication.
Would each node receiving the request write that key in its local 
storage and return success (thus we end up with same key having two 
different values on the two nodes)? Or would each node forward the 
write request to the node responsible to hold that key (determined by 
the hash function)?


Thanks!
--
*Mohammad Hajjat*
*Ph.D. Student*
*Electrical and Computer Engineering*
*Purdue University*




Re: Cassandra book/tuturial

2013-10-27 Thread Dave Brosius
Unfortunately, as tech books tend to be, it's quite a bit out of date, 
at this point.




On 10/27/2013 09:54 PM, Mohan L wrote:




On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.com 
mailto:er...@optinity.com wrote:


Hey Guys,

What is the best book to learn Cassandra from scratch?

Thanks in advance,
Erwin


Hi,

Buy :

Cassandra: The Definitive Guide By Eben Hewitt : 
http://shop.oreilly.com/product/0636920010852.do


Thanks
Mohan L






Re: Composite keys and composite columns

2013-10-17 Thread Dave Brosius
The explanation for Composite columns is muddied by verbage depending on 
whether you are talking about the thrift interface which tends to talk 
about things in low terms, or cql which tends to talk about things in 
higher level terms.


At a thrift/low level, a composite column, really now called a composite 
cell, is just a cell that has a name which contains multiple parts 
packed into a ByteBuffer. These multiple parts are understood by 
cassandra for validation, sorting and slicing purposes.


At a CQL level, there are really just compound keys, where the first 
part of a compound key is the partition key, it alone decides where the 
data lives (what node). The rest of the keys are clustering keys, and 
effect cql row sorting.


In CQL, then, columns that exist after columns which are part of the 
clustering keys, are grouped by those clustering keys. Under the cover 
these extra columns have names that are prefixed by the 
multipart-clustering name.


As for using column names as data, again it depends on the interface 
thrift/cql as to how to look at it. For instance with thrift you can 
slice columns that start from some value and end with some value, and 
find column names between. What shows up as columns probably means 
something to you.


HTH,
dave



On 10/17/2013 07:51 PM, Hartzman, Leslie wrote:


Hi,

I'm looking for clarification on composite keys and composite columns. 
From what I've read with regards to composite keys, you have a 
collection of columns where of 'n' columns, the first n-1 form the 
composite primary key and the last column is the data for that 
composite key. Do I have this right?


What I've just read about composite columns is that there are static 
and dynamic composite column names, but dynamic should be avoided. If 
the column names can be created programmatically, what does the schema 
definition look like for this, or is it omitted since they're 
programmatically created? I'm assuming that these are the dynamic 
composite columns. So how are the static composite columns defined in 
the schema?


Also, if a column name is used as the value as well (composite or 
non-composite columns), how do you query that? If the value is empty 
and the column name IS the value, is the knowledge of what you're 
querying in the business logic due to the construct of that particular 
column family?


Thanks.

Les

[CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this 
email is proprietary to Medtronic and is intended for use only by the 
individual or entity to which it is addressed, and may contain 
information that is private, privileged, confidential or exempt from 
disclosure under applicable law. If you are not the intended recipient 
or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this 
information in any manner is strictly prohibited. In such cases, 
please delete this mail from your records. To view this notice in 
other languages you can either select the following link or manually 
copy and paste the link into the address bar of a web browser: 
http://emaildisclaimer.medtronic.com






Re: Unsupported major.minor version 51.0

2013-09-17 Thread Dave Brosius

Cassandra-2.0 needs to run on jdk7




On 09/17/2013 11:21 PM, Gary Zhao wrote:

Hello

I just saw this error. Anyone knows how to fix it?

[root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f
xss =  -ea -javaagent:bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4014M 
-Xmx4014M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
Exception in thread main java.lang.UnsupportedClassVersionError: 
org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor 
version 51.0

at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: 
org.apache.cassandra.service.CassandraDaemon.  Program will exit.

[root@gary-vm1 apache-cassandra-2.0.0]# java -version
java version 1.6.0_24
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

Thanks
Gary




Re: Custom data type is not work at C* 2.0

2013-09-05 Thread Dave Brosius

I think your class is missing a required

public TypeSerializerVoid getSerializer() {}

method


This is what you need to derive from

https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/db/marshal/AbstractType.java;h=74fe446319c199433b47d3ae60fc4d644e86b653;hb=03045ca22b11b0e5fc85c4fabd83ce6121b5709b



On 09/04/2013 09:14 AM, Katsutoshi wrote:

package my.marshal;

import java.nio.ByteBuffer;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.db.marshal.MarshalException;
import org.apache.cassandra.utils.ByteBufferUtil;

public class DummyType extends AbstractTypeVoid {

public static final DummyType instance = new DummyType();

private DummyType(){
}

public Void compose(ByteBuffer bytes){
return null;
}

public ByteBuffer decompose(Void value){
return ByteBufferUtil.EMPTY_BYTE_BUFFER;
}

public int compare(ByteBuffer o1, ByteBuffer o2){
return 0;
}

public String getString(ByteBuffer bytes){
return ;
}

public ByteBuffer fromString(String source) throws MarshalException{
if(!source.isEmpty()) throw new 
MarshalException(String.format('%s' is not empty, source));

return ByteBufferUtil.EMPTY_BYTE_BUFFER;
}

public void validate(ByteBuffer bytes) throws MarshalException{
}
}




Re: AbstractCassandraDaemon.java (line 134) Exception in thread

2013-07-17 Thread Dave Brosius
 What is your -Xss set to. If it's below 256m, set it there, and see if you 
still have the issues.  - Original Message -From: quot;Julio 
Quieratiquot; ;julio.quier...@gmail.com 

Re: Custom 1.2 Authentication plugin will not work unless user is in system_auth.users column family

2013-06-17 Thread Dave Brosius
It seems to me that isExistingUser should be pushed down to the 
IAuthenticator implementation.


Perhaps you should add a ticket to 
https://issues.apache.org/jira/browse/CASSANDRA


On 06/17/2013 05:12 PM, Bao Le wrote:

Hi,

  We have a custom  authenticator that works well with Cassandra 1.1.5.
When upgrading to C* 1.2.5, authentication failed. Turn out that in 
ClientState.login, we make a call to Auth.isExistingUser(user.getName())
if the AuthenticatedUser is not Anonymous user. This isExistingUser 
method does a query on system_auth.users and if it cannot find the 
name there, throw an exception.


  If our authentication model involves exchanging data on the fly and 
not relying on pre-created users, how do we bypass this check? Should we
add a method on IAuthenticator to specify whether user look-up is 
needed or not?


Bao







Re: Unsubscribe?

2013-06-10 Thread Dave Brosius
You sent an email to user-unsubscr...@cassandra.apache.org  from the 
email addressed used, and it didn't unsubscribe you? Did you get the 
'are you sure' email? Did you check your spam folder?


see

http://cassandra.apache.org/
http://hadonejob.com/img/70907344.jpg



On 06/10/2013 10:46 AM, Fatih P. wrote:

i tried the same and receiving mails.


On Mon, Jun 10, 2013 at 5:34 PM, Luke Hospadaruk 
luke.hospada...@ithaka.org mailto:luke.hospada...@ithaka.org wrote:


Hi,
I hate to be a clod, but I'd really like to unsubscribe from this
list.  I've tried every permutation I can think of to do it the
right way, and all of the styles in the help message.  If there's
a moderator reading this could you please take me off the list?

Thanks,
Luke






Re:

2013-05-16 Thread Dave Brosius

what version of netty is on your classpath?

On 05/16/2013 07:33 PM, aaron morton wrote:
Try the IRC room for the java driver or submit a ticket on the JIRA 
system, see the links here https://github.com/datastax/java-driver



Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/05/2013, at 5:50 PM, bjbylh bjb...@me.com 
mailto:bjb...@me.com wrote:




hello all:
i use datastax java-driver to connect c* ,when the program calls 
cluster.shutdown(),it prints 
out:java.lang.NoSuchMethodError:org.jboss.netty.channelFactory.shutdown()V.

but i do not kown why...
c* is 1.2.4,java-driver is 1.0.0
thank you.

Sent from Samsung Mobile






Re: Look table structuring advice

2013-05-04 Thread Dave Brosius

if you want to store all the roles in one row, you can do

create table roles (synthetic_key int, name text, primary 
key(synthetic_key, name)) with compact storage


when inserting roles, just use the same key

insert into roles (synthetic_key, name) values (0, 'Programmer');
insert into roles (synthetic_key, name) values (0, 'Tester');

and use

select * from roles where synthetic_key = 0;


(or some arbitrary key value you decide to use)

the that data is stored on one node (and its replicas)

of course if the number of roles grows to be large, you lose most of the 
value in having a cluster.




On 05/04/2013 12:09 PM, Jabbar Azam wrote:

Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use the 
following CQL3


select * from roles;

When a new name is added it will be stored on a different host and 
doing a select * is going to be inefficient because the table will be 
stored across the cluster and each node will respond. The number of 
roles may be less than or just greater than a dozen. I'm not sure if 
I'm storing the roles correctly.



The other thing I'm thinking about is that when I've read the roles 
once then I can cache them.


Thanks

Jabbar Azam




Re: Look table structuring advice

2013-05-04 Thread Dave Brosius
I just used 'synthetic key' as it's a term used with standard rdbms to 
mean a key that means nothing in the model, and is often a sequence or such.


There's nothing particular to cassandra specific to that term. Just 
thought it would be something familiar to someone who understood rdbms.


On 05/04/2013 02:44 PM, Jabbar Azam wrote:
I never thought about using a synthetic key, but in this instance with 
about a dozen rows it's probably ok. Thanks for your great idea.


Where  did you read about the synthetic key idea? I've not come across 
it before.


Thanks

Jabbar Azam


On 4 May 2013 19:30, Dave Brosius dbros...@mebigfatguy.com 
mailto:dbros...@mebigfatguy.com wrote:


if you want to store all the roles in one row, you can do

create table roles (synthetic_key int, name text, primary
key(synthetic_key, name)) with compact storage

when inserting roles, just use the same key

insert into roles (synthetic_key, name) values (0, 'Programmer');
insert into roles (synthetic_key, name) values (0, 'Tester');

and use

select * from roles where synthetic_key = 0;


(or some arbitrary key value you decide to use)

the that data is stored on one node (and its replicas)

of course if the number of roles grows to be large, you lose most
of the value in having a cluster.




On 05/04/2013 12:09 PM, Jabbar Azam wrote:

Hello,

I want to create a simple table holding user roles e.g.

create table roles (
   name text,
   primary key(name)
);

If I want to get a list of roles for some admin tool I can use
the following CQL3

select * from roles;

When a new name is added it will be stored on a different host
and doing a select * is going to be inefficient because the
table will be stored across the cluster and each node will
respond. The number of roles may be less than or just greater
than a dozen. I'm not sure if I'm storing the roles correctly.


The other thing I'm thinking about is that when I've read the
roles once then I can cache them.

Thanks

Jabbar Azam







Re: Retrieve data from Cassandra database using Datastax java driver

2013-04-20 Thread Dave Brosius
getColumnDefinitions only returns meta data, to get the data, use the 
iterator to navigate the rows



IteratorRow it = result.iterator();

while (it.hasNext()) {
Row r = it.next();
//do stuff with row
}

On 04/21/2013 12:02 AM, Techy Teck wrote:
I am working with Datastax java-driver. And I am trying to retrieve 
few columns from the database basis on the input that is being passed 
to the below method-



public MapString, String getAttributes(final String userId, final 
CollectionString attributeNames) {


String query=SELECT  +attributeNames.toString().substring(1, 
attributeNames.toString().length()-1)+  from profile where id = 
'+userId+ ';;

CassandraDatastaxConnection.getInstance();

ResultSet result = 
CassandraDatastaxConnection.getSession().execute(query);


MapString, String attributes = new ConcurrentHashMapString, String();
for(Definition def : result.getColumnDefinitions()) {
//not sure how to put the columnName and columnValue that came back 
from the database

attributes.put(column name, column value);
}
return attributes;
}

Now I got the result back from the database in *result*
*
*
Now how to put the colum name and column value that came back from the 
database in a map?


I am not able to understand how to retrieve colum value for a 
particular column in datastax java driver?


Any thoughts will be of great help.




Re: Quorum read after quorum write guarantee

2013-03-10 Thread Dave Brosius

is the read and write happening on the same thread?

On 03/10/2013 12:00 PM, André Cruz wrote:

Hello.

In my application it sometimes happens that I execute a multiget (I use 
pycassa) to fetch data that I have just inserted. I use quorum writes and 
reads, and my RF is 3.

I've noticed that sometimes (1 in 1000 perhaps) an insert followed (300ms 
after) by a multiget will not find the just inserted data. Is this normal? Or 
is something wrong? Can there be some delay to obtain the inserted data even 
with quorum?

Best regards,
André




Re: unsubscribe

2013-02-17 Thread Dave Brosius

On 02/17/2013 01:26 PM, puneet loya wrote:

unsubscribe me please.

Thank you


if only directions were followed:

http://hadonejob.com/images/full/102.jpg


send to

user-unsubscr...@cassandra.apache.org




Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-15 Thread Dave Brosius

see https://issues.apache.org/jira/browse/CASSANDRA-5201


On 02/15/2013 10:05 PM, Yang Song wrote:

Hi,

Does anyone use CDH4's Hadoop with Cassandra to interact? The goal is 
simply read/write to Cassandra from Hadoop direclty using 
ColumnFamilyInput(Output)Format, but seems a bit compatibility issue. 
There are two java exceptions


1. java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected
This shows when I run hadoop jar file to read directly from Cassandra. 
Seems that there is a change on Hadoop that JobContext was changed 
from class to interface. Has anyone have similar issue?

Does it mean the Hadoop version in CDH4 is old?

2. Another error is java.lang.NoSuchMethodError: 
org.apache.cassandra.hadoop.ConfigHelper.setRpcPort(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)V
This shows when the jar file contains rpc port for remote Cassandra 
cluster.


Does anyone have similiar experience? Any comments are welcome. thanks!




Re: Cassandra/cqlsh Error: TSocket read 0 bytes

2013-02-07 Thread Dave Brosius
An exception occurred on the server, check the logs for the details of 
what happened, and post back here.


On 02/07/2013 11:04 PM, Adam Venturella wrote:

Has anyone encountered this before?
What did I most likely break or how do I fix it?




RE: cassandra cqlsh error

2013-02-04 Thread Dave Brosius
xss =  -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms1005M -Xmx1005M -Xmn200M 
-XX:+HeapDumpOnOutOfMemoryError -Xss180k  That is not an error, that is just 
'debugging' information output to the command line.  - Original Message 
-From: quot;Kumar, Anjaniquot; ;anjani.ku...@infogroup.com 

RE: cassandra cqlsh error

2013-02-04 Thread Dave Brosius
 This part, ERROR 13:39:24,456 Cannot open 
/var/lib/cassandra/data/system/Schema/system-Schema-hd-5; partitioner 
org.apache.cassandra.dht.RandomPartitioner does not match system partitioner 
org.apache.cassandra.dht.Murmur3Partitioner.  Note that the default partitioner 
starting with Cassandra 1.2 is Murmur3Partitioner, so you will need to edit 
that to match your old partitioner if upgrading.is a problem.In 1.2 the default 
partitioner was changed, so if you are using 1.2 against old files, you will 
need to edit the cassandra.yaml to have 
org.apache.cassandra.dht.RandomPartitioneras the specified partitioner.  - 
Original Message -From: quot;Kumar, Anjaniquot; 
;anjani.ku...@infogroup.com 

Re: CQL : Request did not complete within rpc_timeout

2013-02-03 Thread Dave Brosius
If querying by a date inequality is an important access paradigm you 
probably want a column that represents some time bucket (a month?) And 
have that column be part of the cql primary key. Thus when a query is 
requested you can make c* happy by specifying a date bucket to pick the 
c* row and the date inequality to slice the cql rows- columns. Of 
course this adds work for the client when dates span multiple buckets, 
but an open ended date inequality is probably troublesome for massive 
datasets anyway.


On 02/03/2013 03:42 PM, Paul van Hoven wrote:

Thanks for the answer. Can anybody else answer my other two questions,
because my problem is not solved yet?

2013/2/3 Edward Capriolo edlinuxg...@gmail.com:

This was the issue that prompted the WITH FILTERING ALLOWED:

https://issues.apache.org/jira/browse/CASSANDRA-4915

Cassandra's storage system can only optimize certain queries.

On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven
paul.van.ho...@googlemail.com wrote:

I'm not sure if I understood your answer.


When you have GB or TB of data any query that adds WITH FILTERING
will not work at scale.

1. You mean any query that requires with filtering is slow?


Secondary indexes need at least one equality. If you want to do this
at scale you might need a different design.

2. And what design would be recommendable then?

3. How should the query look like such that it would scale?



2013/2/3 Edward Capriolo edlinuxg...@gmail.com:

Secondary indexes need at least one equality. If you want to do this
at scale you might need a different design.

Using WITH FILTERING and LIMIT 10 is simply grabbing the first few
random rows that match your criteria.

When you have GB or TB of data any query that adds WITH FILTERING
will not work at scale.

This is why it was added to the language CQL lets you do some queries
that seem fast when your developing with 10 rows, without this
clause you would not know if a query is fast because it hits a
cassandra index, or it is just fast because the results were found in
the first 10 rows.

Edward

On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven
paul.van.ho...@googlemail.com wrote:

Okay, here is the schema (actually it is in german, but I translated
the column names such that it is easier to read for an international
audience):

cqlsh:demodb describe table offerten_log_archiv;

CREATE TABLE offerten_log_archiv (
   offerte_id int PRIMARY KEY,
   aktionen int,
   angezeigt bigint,
   datum timestamp,
   gutschrift bigint,
   kampagne_id int,
   klicks int,
   klicks_ungueltig int,
   kosten bigint,
   statistik_id bigint,
   stunden int,
   werbeflaeche_id int,
   werbemittel_id int
) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   compaction={'class': 'SizeTieredCompactionStrategy'};

CREATE INDEX datum_key ON offerten_log_archiv (datum);

CREATE INDEX stunden_key ON offerten_log_archiv (stunden);

cqlsh:demodb

This is the query I'm trying to perform:
cqlsh:demodb select * from ola where date  '2013-01-01' and hour = 0
limit 10 allow filtering;
Request did not complete within rpc_timeout.

ola = offerten_log_archiv (table name)
hour = stunde (column name)
date = datum (column name)

I hope this information makes my problem more clear.



2013/2/3 Edward Capriolo edlinuxg...@gmail.com:

Without seeing your schema it is hard to say, but in some cases ALLOW
FILTERING might be considered EXPECT THIS COULD BE SLOW. It could
mean the query is not hitting and index and is going to page through
large amounts of data.

On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven
paul.van.ho...@googlemail.com wrote:

After figuring out how to use the  operator on an secondary index I
noticed that in a column family of about 5.5 million datasets I get a
rpc_timeout when trying to read data from this table. In the concrete
situation I want to request data younger than January 1 2013. The
number of rows that should be affected are about 1 million. When doing
the request I get a timeout error:

cqlsh:demodb select * from ola where date  '2013-01-01' and hour = 0
limit 10 allow filtering;
Request did not complete within rpc_timeout.

Actually I find this very confusing since I would except an
exceptional performance gain in comparison to a similar sql query.
Therefore, I think the query I'm performing is not appropriate for
cassandra, although I would do a query like that in this manner on a
sql database. So my question now is: How should I perfrom this query
on cassandra?




Re: error when creating column family using cql3 and persisting datausing thrift

2013-01-15 Thread Dave Brosius
The statements used to create and populate the data might be mildly useful for 
those trying to help   - Original Message -From: quot;Kuldeep 
Mishraquot; ;kuld.cs.mis...@gmail.com 

Re: Create Keyspace failing in 1.2rc2 with syntax error?

2012-12-29 Thread Dave Brosius

the format has changed, check the help in cqlsh

CREATE KEYSPACE Test WITH replication = {'class':'SimpleStrategy', 
'replication_factor':1};


On 12/29/2012 04:27 PM, Adam Venturella wrote:


When I create a keyspace with a SimpleStrategy as outlined here: 
https://cassandra.apache.org/doc/cql3/CQL.html#createKeyspaceStmt



CREATE KEYSPACE Test
WITH strategy_class = SimpleStrategy
 AND strategy_options:replication_factor = 1;
I receive the following error:
Bad Request: line 3:20 mismatched input ':' expecting '='

I'm running the following cqlsh:

Connected to Test Cluster at localhost:9160.

[cqlsh 2.3.0 | Cassandra 1.2.0~rc2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]







Re: Using Cassandra BulkOuputFormat With newer versions of Hadoop (.23+)

2012-09-21 Thread Dave Brosius
I swapped in hadoop-core-1.0.3.jar and rebuilt cassandra, without 
issues. What problems where you having?



On 09/21/2012 07:40 PM, Juan Valencia wrote:


I can't seem to get Bulk Loading to Work in newer versions of Hadoop.
since they switched JobContext from a class to an interface
You lose binary backward compatibility
Exception in thread main java.lang.IncompatibleClassChangeError: 
Found interface org.apache.hadoop.mapreduce.JobContext, but class was 
expected
at 
org.apache.cassandra.hadoop.BulkOutputFormat.checkOutputSpecs(BulkOutputFormat.java:42)


I tried recompiling against the newer Hadoop, but things got messy 
fast.  Has anyone done this? 




Re: anyone know how to lookup non-continguous columns BUT for prefixes?

2012-09-04 Thread Dave Brosius
You'd need to make n queries, or do a superset query from min;-

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-28 Thread Dave Brosius
If i understand you correctly, you are only ever querying for the rows 
where is_exported = false, and turning them into trues. What this means 
is that eventually you will have 1 row in the secondary index table with 
350K columns that you will never look at.


It seems to me you that perhaps you should just hold your own manual 
index cf that points to non exported rows, and just delete those 
columns when they are exported.



On 08/28/2012 05:23 PM, Edward Kibardin wrote:
I have a column family with the secondary index. The secondary index 
is basically a binary field, but I'm using a string for it. The field 
called *is_exported* and can be *'true'* or *'false'*. After request 
all loaded rows are updated with *is_exported = 'false'*.


I'm polling this column table each ten minutes and exporting new rows 
as they appear.


But here the problem: I'm seeing that time for this query grows pretty 
linear with amount of data in column table, and currently it takes 
*from 12 to 20 seconds (!!!) to find 5000 rows*. From my 
understanding, indexed request should not depend on number of rows in 
CF but from number of rows per one index value (cardinality), as it's 
just another hidden CF like:


true : rowKey1 rowKey2 rowKey3 ...
false: rowKey1 rowKey2 rowKey3 ...

I'm using Pycassa to query the data, here the code I'm using:

column_family = pycassa.ColumnFamily(cassandra_pool, 
column_family_name, read_consistency_level=2)

is_exported_expr = create_index_expression('is_exported', 'false')
clause = create_index_clause([is_exported_expr], count = 5000)
column_family.get_indexed_slices(clause)

Am I doing something wrong, but I expect this operation to work MUCH 
faster.


Any ideas or suggestions?

Some config info:
 - Cassandra 1.1.0
 - RandomPartitioner
 - I have 2 nodes and replication_factor = 2 (each server has a full 
data copy)

 - Using AWS EC2, large instances
 - Software raid0 on ephemeral drives

Thanks in advance!





Re: Why so slow?

2012-08-19 Thread Dave Brosius

Are you using multiple client threads?

You might want to try the stress tool in the distribution.



On 08/19/2012 02:09 PM, Peter Morris wrote:

Hi all

I have a Windows 7 machine (64 bit) with DataStax community server 
installed.  Running a benchmark app on the server gives me 7000 
inserts per second.  Running the same app on a networked client gives 
me only 5 inserts per second.  The two computers are connected 
directly via a cross over cable, and the network properties tell me 
that it is a 1Gbps connection.


Is the Windows community edition crippled for network use perhaps, or 
could the problem be something else?


Pete

Pinging 10.0.0.2 with 32 bytes of data:
Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time=1ms TTL=128
Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128
Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128
Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128




Re: Loading data on-demand in Cassandra

2012-08-12 Thread Dave Brosius
When data is first written it does remain in memory until that memory is 
flushed. After the data is only on disk, it remains there until a read 
for that row-key/column is requested so in essense it's always load on 
demand.


Currently there is no support for async notifications of changes.



On 08/12/2012 03:24 PM, Oliver Plohmann wrote:


Hello,

I'm looking a bit into Cassandra to see whether it would be something 
to go with for my company. I searched through the Internet, looked 
through the FAQs, etc. but there are still some few open questions. 
Hope I don't bother anybody with the usual beginner questions ...


Is there a way to do load-on-demand of data in Cassandra? For the time 
being, we cannot afford to built up a cluster that holds our 700 GB 
SQL-Database in RAM. So we need to be able to load data on-demand from 
our relational database. Can this be done in Cassandra? Then there 
also needs to be a way to unload data in order to reclaim RAM space. 
Would be nice if it were possible to register for an asynchronous 
notification in case some value was changed. Can this be done?


Thanks for any answers.
Regards, Oliver





Re: Secondary index impact on write performance

2012-08-04 Thread Dave Brosius
There is a second (system managed) column family for each secondary 
index, so any write to a field that is indexed causes two writes, one to 
the main column family, and another to the index column family, where in 
this index column family the key is the value of the secondary column, 
and the value is the key of the original row.




On 08/04/2012 11:40 AM, David McNelis wrote:

Morning,

Was reading up on secondary indexes and on the Datastax post about 
them, it mentions the additional management overhead, and also that if 
you alter an existing column family, that data will be updated in the 
background.  But how do secondary indexes affect write performance?


If the answer is it doesn't, then how do brand new records get 
located by a subsequent indexed query?


If someone has a link to a post with some of this info, that would be 
awesome.


David




Re: increased RF and repair, not working?

2012-07-27 Thread Dave Brosius

Quorum is defined as

(replication_factor  /  2)  +  1

therefore quorum when rf = 2 is 2! so in your case, both nodes must be up.

Really, using Quorum only starts making sense as a 'quorum' when RF=3






On 07/26/2012 10:38 PM, Yan Chunlu wrote:
I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency 
level of read  write are  both QUORUM.


At first the RF=1, and I figured that one node down will cause the 
cluster unusable. so I changed RF to 2, and run nodetool repair on 
every node(actually I did it twice).


After the operation I think my data should be in at least two nodes, 
and it would be okay if one of them is down.
But when I tried to simulate the failure, by disablegossip of one 
node, and the cluster knows this node is down. then access data from 
the cluster, it returned  MaximumRetryException(pycassa).   as my 
experiences this is caused by UnavailableException, which is means 
the data it is requesting is on a node which is down.


so I wonder my data might not be replicated right, what should I do? 
thanks for the help!


here is the keyspace info:

/
/
/Keyspace: comments:/
/  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy/
/  Durable Writes: true/
/Options: [replication_factor:2]/



the scheme version is okay:

/[default@unknown] describe cluster;/
/Cluster Information:/
/   Snitch: org.apache.cassandra.locator.SimpleSnitch/
/   Partitioner: org.apache.cassandra.dht.RandomPartitioner/
/   Schema versions: /
/f67d0d50-b923-11e1--4f7cf9240aef: [192.168.1.129, 192.168.1.40, 
192.168.1.50]/




the loads are as below:

/nodetool -h localhost ring/
/Address DC  RackStatus State   Load   
 OwnsToken /
/  
 113427455640312821154458202477256070484 /
/192.168.1.50datacenter1 rack1   Up Normal  28.77 GB   
 33.33%  0 /
/192.168.1.40datacenter1 rack1   Up Normal  26.67 GB   
 33.33%  56713727820156410577229101238628035242 /
/192.168.1.129   datacenter1 rack1   Up Normal  33.25 GB   
 33.33%  113427455640312821154458202477256070484 /




Re: increased RF and repair, not working?

2012-07-27 Thread Dave Brosius
You have RF=2, CL= Quorum but 3 nodes. So each row is represented on 2 of the 3 
nodes.If you take a node down, one of two things can happen when you attempt to 
read a row.The row lives on the two nodes that are still up. In this case you 
will successfully read the data.The row lives on one node that is up, and one 
node that is down. In this case the read will fail because you haven't 
fulfilled the quorum (2 nodes in agreement) requirement.   - Original 
Message -From: quot;Riyad Kallaquot; ;rka...@gmail.com 

Re: Batch update efficiency with composite key

2012-07-18 Thread Dave Brosius
 Cassandra doesn't do reads before writes. It just places the updates in 
memtables. In effect updates are the same as inserts.Batches certainly help 
with network latency, and some minor amount of code repetitiion on the server 
side.  - Original Message -From: quot;Leonid Ilyevskyquot; 
;lilyev...@mooncapital.com 

Re: SSTable format

2012-07-13 Thread Dave Brosius

On 07/13/2012 08:00 PM, Michael Theroux wrote:

Hello,

I've been trying to understand in greater detail how SStables are stored, and 
how information is transferred between Cassandra nodes, especially when a new 
node is joining a cluster.

Specifically, Is information stored to SStables ordered by rowkeys?  Some of 
the articles I've read suggests this is the case (although it's a little vague 
if they actually mean that the columns are stored in order, not the rowkeys).  
However, if data is stored in rowkey order, how is this achieved, as sstables 
are immutable?

Thanks for any insights,
-Mike


It depends on what partitioner you use. You should be using the 
RandomPartitioner, and if so, the rows are sorted by the hash of the row 
key. there are partitioners that sort based on the raw key value but 
these partitioners shouldn't be used as they have problems due to uneven 
partitioning of data.


As for how this is done, remember an sstable doesn't hold all the data 
for a column family. Not only does the data for a column family exist on 
multiple servers, there are usually multiple sstable files on disk that 
represent data from one column family on one machine. So at the time the 
sstable is written, the rows that are to be put in the sstable are 
sorted, and written in sorted order. In fact the same rowkey may be 
written in multiple sstables, one sstable having one set of columns for 
the key, the other sstable having other columns for the same key.


On query for some row based on a key, cassandra is responsible for 
finding where the columns are found in which sstables (potentially 
several) and merging the results.


Re: SSTable format

2012-07-13 Thread Dave Brosius
While in memory cassandra calls it a MemTable, but yes sstables are 
write-once, and later combined with others into new ones thru compaction.




On 07/13/2012 09:54 PM, Michael Theroux wrote:

Thanks for the information,

So is the SStable essentially kept in memory, then sorted and written to disk 
on flush?  After that point, an SStable is not modified, but can be written to 
another SStable through compaction?

-Mike

On Jul 13, 2012, at 8:22 PM, Rob Coli wrote:


On Fri, Jul 13, 2012 at 5:18 PM, Dave Brosiusdbros...@baybroadband.net  wrote:

It depends on what partitioner you use. You should be using the
RandomPartitioner, and if so, the rows are sorted by the hash of the row
key. there are partitioners that sort based on the raw key value but these
partitioners shouldn't be used as they have problems due to uneven
partitioning of data.

The formal way this works in the code is that SSTables are ordered by
decorated row key, where decoration is only a transformation when
you are not using OrderedPartitioner. FWIW, in case you see that
DecoratedKey syntax while reading code..

=Rob

--
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb






Re: Composite column/key creation via Hector

2012-07-12 Thread Dave Brosius
BTW, an issue was just fixed with dynamic columns in hector, you might 
want to try trunk.


https://github.com/hector-client/hector/commit/2910b484629add683f61f392553e824c291fb6eb



On 07/12/2012 06:25 PM, aaron morton wrote:
You may have better luck on the Hector Mailing list… 
https://groups.google.com/forum/?fromgroups#!forum/hector-users 
https://groups.google.com/forum/?fromgroups#%21forum/hector-users



Here is something I found in the docs though 
http://hector-client.github.com/hector/build/html/content/composite_with_templates.html


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/07/2012, at 9:04 AM, Michael Cherkasov wrote:


Hi all,

What is the right way to create CF with dynamic composite column and 
composite key?


Now I use code like this:

 private static final String DEFAULT_DYNAMIC_COMPOSITE_ALIAES =

(a=AsciiType,b=BytesType,i=IntegerType,x=LexicalUUIDType,l=LongType,t=TimeUUIDType,s=UTF8Type,u=UUIDType,A=AsciiType(reversed=true),B=BytesType(reversed=true),I=IntegerType(reversed=true),X=LexicalUUIDType(reversed=true),L=LongType(reversed=true),T=TimeUUIDType(reversed=true),S=UTF8Type(reversed=true),U=UUIDType(reversed=true));


for composite columns:
 BasicColumnFamilyDefinition columnFamilyDefinition = new 
BasicColumnFamilyDefinition();
columnFamilyDefinition.setComparatorType( 
ComparatorType.DYNAMICCOMPOSITETYPE );
columnFamilyDefinition.setComparatorTypeAlias( 
DEFAULT_DYNAMIC_COMPOSITE_ALIAES );

columnFamilyDefinition.setKeyspaceName( keyspaceName );
columnFamilyDefinition.setName( TestCase );
columnFamilyDefinition.setColumnType( ColumnType.STANDARD );
ColumnFamilyDefinition cfDefStandard = new ThriftCfDef( 
columnFamilyDefinition );
cfDefStandard.setKeyValidationClass( 
ComparatorType.UTF8TYPE.getClassName() );
cfDefStandard.setDefaultValidationClass( 
ComparatorType.UTF8TYPE.getClassName() );


for keys:
columnFamilyDefinition = new BasicColumnFamilyDefinition();
columnFamilyDefinition.setComparatorType( 
ComparatorType.UTF8TYPE );

columnFamilyDefinition.setKeyspaceName( keyspaceName );
columnFamilyDefinition.setName( Parameter );
columnFamilyDefinition.setColumnType( ColumnType.STANDARD );
cfDefStandard = new ThriftCfDef( columnFamilyDefinition );
cfDefStandard.setKeyValidationClass( 
ComparatorType.DYNAMICCOMPOSITETYPE.getClassName() + 
DEFAULT_DYNAMIC_COMPOSITE_ALIAES );
cfDefStandard.setDefaultValidationClass( 
ComparatorType.UTF8TYPE.getClassName() );


Does it correct code? Do I really need 
so terrible DEFAULT_DYNAMIC_COMPOSITE_ALIAES ?






Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
If i read what you are saying, you are _not_ using composite keys? 
That's one thing that could do it, if the first part of the composite 
key had a very very low cardinality.


On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

Hi,

I've searched online but was unable to find any leads for the problem 
below. This mailing list seemed the most appropriate place. Apologies 
in advance if that isn't the case.


I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup 
the nodes with tokens /evenly distributed across the token space/, for 
a 5-node cluster (as evidenced below under the effective-ownership 
column of the nodetool ring output). My data is a set of a few 
million crawled web pages, crawled using Nutch, and also indexed using 
the solrindex command available through Nutch. AFAIK, the key for 
each document generated from the crawled data is the URL.


Based on the load values for the nodes below, despite adding about 3 
million web pages to this index via the HTTP Rest API (e.g.: 
http://9.9.9.x:8983/solandra/index/update), some nodes are still 
empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few 
kilobytes (shown in *bold* below) of the index, while the remaining 3 
nodes are consistently getting hammered by all the data. If the 
RandomPartioner (which is what I'm using for this cluster) is supposed 
to achieve an even distribution of keys across the token space, why is 
it that the data below is skewed in this fashion? Literally, no key 
was yet been hashed to the nodes 9.9.9.1 and 9.9.9.3 below. Could 
someone possibly shed some light on this absurdity?.


[me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
Address DC  RackStatus State   Load   
 Effective-Owership  Token
  
 136112946768375385385349842972707284580
9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB 
20.00%  0
9.9.9.1   datacenter1 rack1   Up Normal *21.44 KB*   
 20.00%  34028236692093846346337460743176821145
9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB   
 20.00%  68056473384187692692674921486353642290
9.9.9.3   datacenter1 rack1   Up Normal *50.79 KB*   
 20.00%  102084710076281539039012382229530463435
9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB   
 20.00%  136112946768375385385349842972707284580


Thanks in advance.

Regards,
Safdar




Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius

Well it sounds like this doesn't apply to you.

if you had set up your column family in cql as  PRIMARY KEY 
(domain_name, path) or something like that and where looking at lots 
and lots of url pages (domain_name + path), but from a very small number 
domain_names, then the partitioner being just the domain_name could 
account for an uneven distribution.


But it sounds like your key is just a URL so that should (in theory) be 
fine.




On 06/24/2012 01:53 PM, Safdar Kureishy wrote:

Hi Dave,

Would you mind elaborating a bit more on that, preferably with an 
example? AFAIK, Solandra uses the unique id of the Solr document as 
the input for calculating the md5 hash for shard/node assignment. In 
this case the ids are just millions of varied web URLs that do /not/ 
adhere to any regular expression. I'm not sure if that answers your 
question below?


Thanks,
Safdar

On Sun, Jun 24, 2012 at 8:38 PM, Dave Brosius 
dbros...@mebigfatguy.com mailto:dbros...@mebigfatguy.com wrote:


If i read what you are saying, you are _not_ using composite keys?
That's one thing that could do it, if the first part of the
composite key had a very very low cardinality.


On 06/24/2012 11:00 AM, Safdar Kureishy wrote:

Hi,

I've searched online but was unable to find any leads for the
problem below. This mailing list seemed the most appropriate
place. Apologies in advance if that isn't the case.

I'm running a 5-node Solandra cluster (Solr + Cassandra). I've
setup the nodes with tokens /evenly distributed across the token
space/, for a 5-node cluster (as evidenced below under the
effective-ownership column of the nodetool ring output). My
data is a set of a few million crawled web pages, crawled using
Nutch, and also indexed using the solrindex command available
through Nutch. AFAIK, the key for each document generated from
the crawled data is the URL.

Based on the load values for the nodes below, despite adding
about 3 million web pages to this index via the HTTP Rest API
(e.g.: http://9.9.9.x:8983/solandra/index/update), some nodes
are still empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have
just a few kilobytes (shown in *bold* below) of the index, while
the remaining 3 nodes are consistently getting hammered by all
the data. If the RandomPartioner (which is what I'm using for
this cluster) is supposed to achieve an even distribution of keys
across the token space, why is it that the data below is skewed
in this fashion? Literally, no key was yet been hashed to the
nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some
light on this absurdity?.

[me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
Address DC  RackStatus State   Load  
 Effective-Owership  Token
 
 136112946768375385385349842972707284580
9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
20.00%  0
9.9.9.1   datacenter1 rack1   Up Normal *21.44 KB*  
 20.00%  34028236692093846346337460743176821145
9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
   20.00%  68056473384187692692674921486353642290
9.9.9.3   datacenter1 rack1   Up Normal *50.79 KB*  
 20.00%  102084710076281539039012382229530463435
9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
   20.00%  136112946768375385385349842972707284580


Thanks in advance.

Regards,
Safdar







Re: Find rows without a column

2012-06-22 Thread Dave Brosius

On 06/22/2012 03:57 AM, Jeff Williams wrote:

Hi,

It doesn't look like this is possible, but can I select all rows missing a certain 
column? The equivalent of select * where col is null in SQL.

Regards,
Jeff


remember that there really is no such thing as a row, just arbitrary 
columns associated with a key. So no, can't find 'rows' where a column 
is missing.


Re: store large String as col value

2012-06-20 Thread Dave Brosius
 Column values are limited at 2G.Why store them as Base64? that just adds 
overhead. Storing the raw bytes will save you a bunch. - Original Message 
-From: quot;Cyril Auburtinquot; ;cyril.aubur...@gmail.com 

Re: Urgent - IllegalArgumentException during compaction and memtable flush

2012-06-14 Thread Dave Brosius
One of the column names on the row with key 353339332d3134363533393931 
failed to validate with the validator for the column.


If you really are after what column is problematic, and are able to 
build and run cassandra, you can add debugging info to Column.java


protected void validateName(CFMetaData metadata) throws 
MarshalException

{
*try {*
AbstractType? nameValidator = metadata.cfType == 
ColumnFamilyType.Super ? metadata.subcolumnComparator : metadata.comparator;


nameValidator.validate(name());
*} catch (MarshalException me) {
throw new MarshalException(Failed validating name:  + 
ByteBufferUtil.bytesToHex(name()), me);

}*
}

btw, the 92668395684826132216160944211592988451 is just the key's token.



On 06/14/2012 01:56 PM, Piavlo wrote:


I was able to figure out that 353339332d3134363533393931 is the row key
while no idea what is 92668395684826132216160944211592988451 part?

sstable2json also fails with validation error on this row key

now since I have lost data for this row -  how do I find out that was 
the root cause?


thanks protected void validateName(CFMetaData metadata) throws 
MarshalException

{
AbstractType? nameValidator = metadata.cfType == 
ColumnFamilyType.Super ? metadata.subcolumnComparator : 
metadata.comparator;

nameValidator.validate(name());
}

On 06/14/2012 06:17 PM, Piavlo wrote:

Ok i've run scrub on the 3 nodes and the problematic row
Error validating row 
DecoratedKey(92668395684826132216160944211592988451, 
353339332d3134363533393931)


The full message is

 WARN [CompactionExecutor:2700] 2012-06-14 14:26:42,041 
CompactionManager.java (line 582) Non-fatal error reading row 
(stacktrace follows)
java.io.IOError: java.io.IOException: Error validating row 
DecoratedKey(92668395684826132216160944211592988451, 
353339332d3134363533393931)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:114)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:97)
at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:137)
at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:143)
at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:566)
at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:473)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:64)
at 
org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:213)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:183)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Error validating row 
DecoratedKey(92668395684826132216160944211592988451, 
353339332d3134363533393931)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:241)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:110)

... 13 more
Caused by: org.apache.cassandra.db.marshal.MarshalException: Not 
enough bytes to read value of component 1
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.validate(AbstractCompositeType.java:240)

at org.apache.cassandra.db.Column.validateName(Column.java:273)
at 
org.apache.cassandra.db.Column.validateFields(Column.java:278)
at 
org.apache.cassandra.db.ColumnFamily.validateColumnFields(ColumnFamily.java:372)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:237)

... 14 more
 WARN [CompactionExecutor:2700] 2012-06-14 14:26:42,085 
CompactionManager.java (line 624) Row at 4047368880 is unreadable; 
skipping to next



This happened on several sstables on on each of the nodes - meaning 
it was mutated several times


dsc2b:
   
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-450-Data.db  at 
4244390041
   
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-452-Data.db  at 
9366462649


dsc2c:
   
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-413-Data.db  at 
4047368880
   
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-481-Data.db  at 
3598063925


dsc1a:
  
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-883-Data.db  at 
271195463
  
/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-733-Data.db  at 

Re: Supercolumn behavior on writes

2012-06-13 Thread Dave Brosius

You can create composite columns on the fly.



On 06/13/2012 09:58 PM, Greg Fausak wrote:

That's a good question.  I just went to a class, Ben was saying that
any action on a super column requires de-re-serialization.  But, it
would be nice if a write had this sort of efficiency.

I have been playing with the 1.1.1 version, in that one there are
'composite' columns, which I think are like super columns, but
they don't require serialization and deserialization.  However, there
seems to be a catch.   You can't 'invent' columns on the fly, everything has
to be declared when you declare the column family.

---greg


On Wed, Jun 13, 2012 at 6:52 PM, Oleg Dulinoleg.du...@gmail.com  wrote:

Does a write to a sub column involve deserialization of the entire super
column ?

Thanks,
Oleg





Re: Supercolumn behavior on writes

2012-06-13 Thread Dave Brosius

Via thrift, or a high level client on thrift, see as an example

http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1

On 06/13/2012 11:08 PM, Greg Fausak wrote:

Interesting.

How do you do it?

I have a version 2 CF, that works fine.
A version 3 table won't let me invent columns that
don't exist yet. (for composite tables).  What's the trick?

cqlsh -3 cas1
use onplus;
cqlsh:onplus  select * from at_event where ac_event_id = 7690254;
  ac_event_id | ac_creation  | ac_event_type | ac_id | ev_sev
-+--+---+---+
  7690254 | 2011-07-23 00:11:47+ | SERV.CPE.CONN |   \N |  5
cqlsh:onplus  update at_event set wingy = 'toto' where ac_event_id = 7690254;
Bad Request: Unknown identifier wingy

This is what I used to create it:
//
// create the event column family, this contains the static
// part of the definition.  many additional columns can be specified
// in the port from relational, these would be mainly the at_event table
//

use onplus;

create columnfamily
 at_event
(
 ac_event_id int PRIMARY KEY,
 ac_event_type text,
 ev_sev int,
 ac_id text,
 ac_creation timestamp
) with compression_parameters:sstable_compression = ''
;

-g




On Wed, Jun 13, 2012 at 9:36 PM, samalsamalgo...@gmail.com  wrote:

  You can't 'invent' columns on the fly, everything has

to be declared when you declare the column family.


  That' s incorrect. You can define name on fly. Validation must be define
when declaring CF





Re: Out of memory error

2012-06-10 Thread Dave Brosius

What version of Cassandra?

might be related to https://issues.apache.org/jira/browse/CASSANDRA-4098



On 06/11/2012 12:07 AM, Prakrati Agrawal wrote:


Sorry

I ran list /columnFamilyName/; and it threw this error.

Thanks and Regards

Prakrati

*From:*aaron morton [mailto:aa...@thelastpickle.com]
*Sent:* Saturday, June 09, 2012 12:18 AM
*To:* user@cassandra.apache.org
*Subject:* Re: Out of memory error

When you ask a question please include the query or function call you 
have made. An any other information that would help someone understand 
what you are trying to do.


Also, please list things you have already tried to work around the 
problem.


Cheers

-

Aaron Morton

Freelance Developer

@aaronmorton

http://www.thelastpickle.com

On 8/06/2012, at 9:04 PM, Prakrati Agrawal wrote:



Dear all,

When I try to list the entire data in my column family I get the 
following error:


Using default limit of 100

Exception in thread main java.lang.OutOfMemoryError: Java heap space

at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:140)


at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)


at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)


at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)


at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)


at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)


at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)


at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683)


at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667)


at 
org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1373)


at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:264)


at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)


at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)

Please help me

Thanks and Regards

Prakrati



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the 
person(s) or entities to which it is addressed. Any review, 
retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than 
the intended recipient is prohibited and may be illegal. If you 
received this in error, please contact the sender and delete the 
message from your system.


Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet 
accessibility, the Company cannot accept liability for any virus 
introduced by this e-mail or any attachment and you are advised to use 
up-to-date virus checking software.




This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the 
person(s) or entities to which it is addressed. Any review, 
retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than 
the intended recipient is prohibited and may be illegal. If you 
received this in error, please contact the sender and delete the 
message from your system.


Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet 
accessibility, the Company cannot accept liability for any virus 
introduced by this e-mail or any attachment and you are advised to use 
up-to-date virus checking software.




Re: Schema changes not getting picked up from different process

2012-05-25 Thread Dave Brosius

What version are you using?

It might be related to https://issues.apache.org/jira/browse/CASSANDRA-4052

On 05/25/2012 07:32 AM, Victor Blaga wrote:

Hi all,

This is my first message on this posting list so I'm sorry if I am 
breaking any rules. I just wanted to report some sort of a problem 
that I'm having with Cassandra.
Short version of my problem: if I make changes to the schema from 
within a process, they do not get picked up by the other processes 
that are connected to the Cassandra cluster unless I trigger a reconnect.


Long version:

Process 1: cassandra-cli connected to cluster and keyspace
Process 2: cassandra-cli connected to cluster and keyspace

From within process 1 - create column family test;
From within process 2 - describe test; - fails with an error (other 
query/insert methods fail as well).


I'm not sure if this is indeed a bug or just a misunderstanding from 
my part.


Regards,
Victor




Re: unsubscribe

2012-05-21 Thread Dave Brosius

On 05/21/2012 02:44 AM, Qingyan(Evan) Liu wrote:



send to user-unsubscr...@cassandra.apache.org


Re: unsubscribe

2012-05-17 Thread Dave Brosius

On 05/17/2012 09:49 PM, casablinca126.com wrote:

unsubscribe




send that message to


user-unsubscr...@cassandra.apache.org


Re: Startup fails after updgrading from 1.0.8 to 1.1.0

2012-05-16 Thread Dave Brosius

Might be related to

https://issues.apache.org/jira/browse/CASSANDRA-3794



On 05/16/2012 08:12 AM, Christoph Eberhardt wrote:

Hi there,

if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first 
place, all seemed to work fine. So I started upgrading the rest of the cluster 
(at the time only one other node, that is a replica). After having a several 
errors, I restarted the cluster and now cassandra won't even start up. Startup 
fails with the following error message:


INFO 13:59:05,175 Logging initialized INFO 13:59:05,178 JVM vendor/version: 
Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 13:59:05,178 Heap size: 
8420720640/8420720640 INFO 13:59:05,178 Classpath: 
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar
 INFO 13:59:07,312 JNA mlockall successful INFO 13:59:07,325 Loading settings 
from file:/opt/cassandra/apache-cassandra-1.1.0/conf/cassandra.yaml INFO 
13:59:07,419 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is 
mmap INFO 13:59:07,567 Global memtable threshold is enabled at 2676MB
  INFO 13:59:07,654 Initializing key cache with capacity of 100 MBs.
  INFO 13:59:07,661 Scheduling key cache save to each 14400 seconds (going to 
save all keys).
  INFO 13:59:07,662 Initializing row cache with capacity of 0 MBs and provider 
org.apache.cassandra.cache.SerializingCacheProvider
  INFO 13:59:07,664 Scheduling row cache save to each 0 seconds (going to save 
all keys).
  INFO 13:59:07,717 Opening 
/opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-9
 (28520 bytes)
  INFO 13:59:07,717 Opening 
/opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-10
 (28520 bytes)
  INFO 13:59:07,746 Opening 
/opt/cassandra/database/data/system/NodeIdInfo/system-NodeIdInfo-hc-1 (187 
bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-3 
(1892 bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-1 
(1892 bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-2 
(1892 bytes)
  INFO 13:59:07,790 Opening 
/opt/cassandra/database/data/system/Versions/system-Versions-hc-34 (247 bytes)
  INFO 13:59:07,790 Opening 
/opt/cassandra/database/data/system/Versions/system-Versions-hc-35 (247 bytes)
  INFO 13:59:07,815 Opening 
/opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-32 (490 bytes)
  INFO 13:59:07,816 Opening 
/opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-33 (115 bytes)
  INFO 13:59:07,850 Opening 
/opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-9
 (506 bytes)
  INFO 13:59:07,850 Opening 
/opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-10
 (506 bytes)
  INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-74 (148 
bytes) INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-72 (406 
bytes) INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-73 (80 
bytes)ERROR 13:59:08,244 Exception encountered during startup
java.lang.IllegalArgumentException: value already present: 1034
 at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
 at 
com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111)
 at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96)
 at com.google.common.collect.HashBiMap.put(HashBiMap.java:84)
 at org.apache.cassandra.config.Schema.load(Schema.java:385)
 at org.apache.cassandra.config.Schema.load(Schema.java:106)
 at org.apache.cassandra.config.Schema.load(Schema.java:91)
 at 
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:533)
 at 

Re: Retrieving old data version for a given row

2012-05-16 Thread Dave Brosius
You're in for a world of hurt going down that rabbit hole. If you truely 
want version data then you should think about changing your keying to 
perhaps be a composite key where key is of form


NaturalKey/VersionId

Or if you want the versioning at the column level, use composite columns 
with ColumnName/VersionId format




On 05/16/2012 10:16 AM, Felipe Schmidt wrote:

That was very helpfull, thank you very much!

I still have some questions:
-it is possible to make Cassandra keep old value data after flushing?
The same question for the memTable, before flushing. Seems to me that
when I update some tuple, the old data will be overwrited in memTable,
even before flushing.
-it is possible to scan values from the memtable, maybe using the
so-called Thrift API? Using the client-api I can just see the newest
data version, I can't see what's really happening with the memTable.

I ask that cause what I'll try to do is a Change Data Capture to
Cassandra and the answers will define what kind of aproaches I'm able
to use.

Thanks in advance.

Regards,
Felipe Mathias Schmidt
(Computer Science UFRGS, RS, Brazil)


2012/5/14 aaron mortonaa...@thelastpickle.com:

Cassandra does not provide access to multiple versions of the same column.
It is essentially implementation detail.

All mutations are written to the commit log in a binary format, see the
o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for
analysis you may want to change commitlog_sync in cassandra.yaml)

Here is post about looking at multiple versions columns in an
sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/

Remember that not all versions of a column are written to disk
  (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). Also
compaction will compress multiple versions of the same column from multiple
files into a single version in a single file .

Hope that helps.


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote:

Yes, I need this information just for academic purposes.

So, to read old data values, I tried to open the Commitlog using tail
-f and also the log files viewer of Ubuntu, but I can not see many
informations inside of the log!
Is there any other way to open this log? I didn't find any Cassandra
API for this purpose.

Thanks averybody in advance.

Regards,
Felipe Mathias Schmidt
(Computer Science UFRGS, RS, Brazil)




2012/5/14 zhangcheng2zhangche...@software.ict.ac.cn:

After compaciton, the old version data will gone!




zhangcheng2


From: Felipe Schmidt

Date: 2012-05-14 05:33

To: user

Subject: Retrieving old data version for a given row

I'm trying to retrieve old data version for some row but it seems not

be possible. I'm a beginner  with Cassandra and the unique aproach I

know is looking to the SSTable in the storage folder, but if I insert

some column and right after insert another value to the same row,

after flushing, I only get the last value.

Is there any way to get the old data version? Obviously, before compaction.


Regards,

Felipe Mathias Schmidt

(Computer Science UFRGS, RS, Brazil)







Re: Startup fails after updgrading from 1.0.8 to 1.1.0

2012-05-16 Thread Dave Brosius

tracking issue here: https://issues.apache.org/jira/browse/CASSANDRA-4251
might be related to: https://issues.apache.org/jira/browse/CASSANDRA-3794

On 05/16/2012 08:12 AM, Christoph Eberhardt wrote:

Hi there,

if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first 
place, all seemed to work fine. So I started upgrading the rest of the cluster 
(at the time only one other node, that is a replica). After having a several 
errors, I restarted the cluster and now cassandra won't even start up. Startup 
fails with the following error message:


INFO 13:59:05,175 Logging initialized INFO 13:59:05,178 JVM vendor/version: 
Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 13:59:05,178 Heap size: 
8420720640/8420720640 INFO 13:59:05,178 Classpath: 
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar
 INFO 13:59:07,312 JNA mlockall successful INFO 13:59:07,325 Loading settings 
from file:/opt/cassandra/apache-cassandra-1.1.0/conf/cassandra.yaml INFO 
13:59:07,419 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is 
mmap INFO 13:59:07,567 Global memtable threshold is enabled at 2676MB
  INFO 13:59:07,654 Initializing key cache with capacity of 100 MBs.
  INFO 13:59:07,661 Scheduling key cache save to each 14400 seconds (going to 
save all keys).
  INFO 13:59:07,662 Initializing row cache with capacity of 0 MBs and provider 
org.apache.cassandra.cache.SerializingCacheProvider
  INFO 13:59:07,664 Scheduling row cache save to each 0 seconds (going to save 
all keys).
  INFO 13:59:07,717 Opening 
/opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-9
 (28520 bytes)
  INFO 13:59:07,717 Opening 
/opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-10
 (28520 bytes)
  INFO 13:59:07,746 Opening 
/opt/cassandra/database/data/system/NodeIdInfo/system-NodeIdInfo-hc-1 (187 
bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-3 
(1892 bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-1 
(1892 bytes)
  INFO 13:59:07,777 Opening 
/opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-2 
(1892 bytes)
  INFO 13:59:07,790 Opening 
/opt/cassandra/database/data/system/Versions/system-Versions-hc-34 (247 bytes)
  INFO 13:59:07,790 Opening 
/opt/cassandra/database/data/system/Versions/system-Versions-hc-35 (247 bytes)
  INFO 13:59:07,815 Opening 
/opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-32 (490 bytes)
  INFO 13:59:07,816 Opening 
/opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-33 (115 bytes)
  INFO 13:59:07,850 Opening 
/opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-9
 (506 bytes)
  INFO 13:59:07,850 Opening 
/opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-10
 (506 bytes)
  INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-74 (148 
bytes) INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-72 (406 
bytes) INFO 13:59:07,867 Opening 
/opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-73 (80 
bytes)ERROR 13:59:08,244 Exception encountered during startup
java.lang.IllegalArgumentException: value already present: 1034
 at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
 at 
com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111)
 at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96)
 at com.google.common.collect.HashBiMap.put(HashBiMap.java:84)
 at org.apache.cassandra.config.Schema.load(Schema.java:385)
 at org.apache.cassandra.config.Schema.load(Schema.java:106)
 at org.apache.cassandra.config.Schema.load(Schema.java:91)
 at 

Re: understanding of native indexes: limitations, potential side effects,...

2012-05-16 Thread Dave Brosius
Each index you define on the source CF is created using an internal CF 
that has as its key the value of the column it's indexing, and as its 
columns, all the keys of all the rows in the source CF that have that 
value. So if all your rows in your source CF have the same value, then 
your index cf will have one row with N columns for each N rows in the 
original CF.




On 05/16/2012 02:58 PM, David Vanderfeesten wrote:

Txs Jeremiah,
But I am not sure I am following  number of columns could be equal to 
number of rows .  Is native index implemented as one cf shared over 
all the indexes (one row in the idx CF corresponding to one index) or  
is there an internal index cf per index?. My (potential wrong) mindset 
was the latter. In that case if you would index a column with a very 
high cardinality like for example serialNbr,  this corresponding 
internal idx cf will just lead to almost the same nbr of rows as the 
original cf containing the serialnbr. I can''t match that what you are 
explaining...


- David

On Wed, May 16, 2012 at 6:23 PM, Jeremiah Jordan 
jeremiah.jor...@morningstar.com 
mailto:jeremiah.jor...@morningstar.com wrote:


The limitation is because number of columns could be equal to
number of rows.  If number of rows is large this can become an issue.

-Jeremiah


*From:* David Vanderfeesten [feest...@gmail.com
mailto:feest...@gmail.com]
*Sent:* Wednesday, May 16, 2012 6:58 AM
*To:* user@cassandra.apache.org mailto:user@cassandra.apache.org
*Subject:* understanding of native indexes: limitations, potential
side effects,...

Hi

I like to better understand the limitations of native indexes,
potential side effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data
locally on the node itself.
- Indexes do not return values in a sorted way (hashes of the
indexed row keys are defining the order)
- Given by the design referred in the first bullet, a coordinator
node receiving a read of a native index, needs to spawn a read to
multiple nodes(set of nodes together covering at least the
complete key space + potentially more to assure read consistency
level).
- Each write to an indexed column leads to an additional local
read of the index to update the index (kind of obvious but easily
forgotten when tuning your system for write-only workload)
- When using a where clause in CQL you need at least to specify an
equal condition on a native indexed column. Additional conditions
in the where clause are filtered out by the coordinator node
receiving the CQL query.
- native indexes do not support very well columns with high number
of discrete values throughout the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of
discrete values:
I assume native indexes  are implemented with an internally
managed CF per index. With high cardinality values, in worst case,
the number of rows in the index are identical to the number of
rows of the indexed CF. Or are there other reasons for the
limitation, and if that's the case, is there a guideline on the
max. nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write
action) atomic and isolated from concurrent updates?

Txs!

David









Re: cassandra upgrade to 1.1 - migration problem

2012-05-15 Thread Dave Brosius
The replication factor for a keyspace is stored in the 
system.schema_keyspaces column family.


Since you can't view this with cli as the server won't start, the only 
way to look at it, that i know of is to use the


sstable2json tool on the *.db file for that column family...

So for instance on my machine i do

./sstable2json 
/var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-ia-1-Data.db


and get


{
7374726573735f6b73: [[durable_writes,true,1968197311980145], 
[name,stress_ks,1968197311980145], 
[strategy_class,org.apache.cassandra.locator.SimpleStrategy,1968197311980145], 
[strategy_options,{\replication_factor\:\3\},1968197311980145]]


It's likely you don't have a entry from replication_factor.

Theoretically i suppose you could embellish the output, and use 
json2sstable to fix it, but I have no experience here, and would get the 
blessings of datastax fellas, before proceeding.






On 05/15/2012 07:02 PM, Casey Deccio wrote:
Sorry to reply to my own message (again).  I took a closer look at the 
logs and realized that the partitioner errors aren't what kept the 
daemon to stop; those errors are in the logs even before I upgraded.  
This one seems to be the culprit.


java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
Caused by: java.lang.RuntimeException: 
org.apache.cassandra.config.ConfigurationException: SimpleStrategy 
requires a replication_factor strategy option.

at org.apache.cassandra.db.Table.init(Table.java:275)
at org.apache.cassandra.db.Table.open(Table.java:114)
at org.apache.cassandra.db.Table.open(Table.java:97)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:254)

... 5 more
Caused by: org.apache.cassandra.config.ConfigurationException: 
SimpleStrategy requires a replication_factor strategy option.
at 
org.apache.cassandra.locator.SimpleStrategy.validateOptions(SimpleStrategy.java:71)
at 
org.apache.cassandra.locator.AbstractReplicationStrategy.createReplicationStrategy(AbstractReplicationStrategy.java:218)
at 
org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:295)

at org.apache.cassandra.db.Table.init(Table.java:271)
... 9 more
Cannot load daemon

I'm not sure how to check the replication_factor and/or update it 
without using cassandra-cli, which requires the daemon to be running.


Casey




Re: How to make the search by columns in range case insensitive ?

2012-05-14 Thread Dave Brosius
This could be accomplished with a custom 'CaseInsensitiveUTF8Type' 
comparator to be used as the comparator for that column family. This 
would require adding a class of your writing to the server.




On 05/14/2012 07:26 AM, Ertio Lew wrote:
I need to make a search by names index using entity names as column 
names in a row. This data is split in several rows using the first 3 
character of entity name as row key  the remaining part as column 
name  col value contains entity id.


But there is a problem, I m storing this data in a CF using byte type 
comparator. I need to make case insensitive queries to retrieve 'n' no 
of cols column names starting from a point.

Any ideas about how should I do that ?




Re: How do I add a custom comparator class to a cassandra cluster ?

2012-05-14 Thread Dave Brosius

it can be in a separate jar with just one class.

On 05/15/2012 12:29 AM, Ertio Lew wrote:
Can I put this comparator class in a separate new jar(with just this 
single file) or is it to be appended to the original jar along with 
the other comparator classes?


On Tue, May 15, 2012 at 12:22 AM, Tom Duffield (Mailing Lists) 
tom.duffield.li...@gmail.com mailto:tom.duffield.li...@gmail.com 
wrote:


Kirk is correct.

-- 
Tom Duffield (Mailing Lists)

Sent with Sparrow http://www.sparrowmailapp.com/?sig

On Monday, May 14, 2012 at 1:41 PM, Kirk True wrote:


Disclaimer: I've never tried, but I'd imagine you can drop a JAR
containing the class(es) into the lib directory and perform a
rolling
restart of the nodes.

On 5/14/12 11:11 AM, Ertio Lew wrote:

I need to add a custom comparator to a cluster, to sort columns
in a
certain customized fashion. How do I add the class to the cluster ?







Re: Retrieving old data version for a given row

2012-05-13 Thread Dave Brosius
The only way you could get the old value for a column would be to insert 
the column value, then flush, then insert the new column, then before 
compaction look at the old sstable.


If you insert the value twice in a row without a flush, the old value is 
gone, as it only exists in memtables (and in the commit log - of course).


Hopefully you want this information for learning purposes only, and 
aren't actually using this for real purposes.




On 05/13/2012 05:33 PM, Felipe Schmidt wrote:

I'm trying to retrieve old data version for some row but it seems not
be possible. I'm a beginner  with Cassandra and the unique aproach I
know is looking to the SSTable in the storage folder, but if I insert
some column and right after insert another value to the same row,
after flushing, I only get the last value.
Is there any way to get the old data version? Obviously, before compaction.

Regards,
Felipe Mathias Schmidt
(Computer Science UFRGS, RS, Brazil)





Re: primary keys query

2012-05-11 Thread Dave Brosius
Inequalities on secondary indices are always done in memory, so without 
at least one EQ on another secondary index you will be loading every row 
in the database, which with a massive database isn't a good idea. So by 
requiring at least one EQ on an index, you hopefully limit the set of 
rows that need to be read into memory to a manageable size. Although 
obviously you can still get into trouble with that as well.




On 05/11/2012 09:39 AM, cyril auburtin wrote:

Sorry for askign that
but Why is it necessary to always have at least one EQ comparison

[default@Keyspace1] get test where birth_year1985;
No indexed columns present in index clause with operator EQ

It oblige to have one dummy indexed column, to do this query

[default@Keyspace1] get test where tag=sea and birth_year1985;
---
RowKey: sam
= (column=birth_year, value=1988, timestamp=1336742346059000)






Re: Behavior on inconsistent reads

2012-05-10 Thread Dave Brosius
If you read at Consistency of at least quorum, you are guaranteed that 
at least one of the nodes has the latest data, and so you get the right 
data. If you read with less than quorum it would be possible for all the 
nodes that respond to have stale data.




On 05/10/2012 09:46 PM, Carpenter, Curt wrote:


Hi all, newbie here. Be gentle.

From 
http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests:


Thus, the coordinator first contacts the replicas specified by the 
consistency level. The coordinator will send these requests to the 
replicas that are currently responding most promptly. The nodes 
contacted will respond with the requested data; if multiple nodes are 
contacted, the rows from each replica are compared in memory to see if 
they are consistent. If they are not, then the replica that has the 
most recent data (based on the timestamp) is used by the coordinator 
to forward the result back to the client.


To ensure that all replicas have the most recent version of 
frequently-read data, the coordinator also contacts and compares the 
data from all the remaining replicas that own the row in the 
background, and if they are inconsistent, issues writes to the 
out-of-date replicas to update the row to reflect the most recently 
written values. This process is known as/read repair/. Read repair can 
be configured per column family (using/read_repair_chance/ 
http://www.datastax.com/docs/1.0/configuration/storage_configuration#read-repair-chance), 
and is enabled by default.


For example, in a cluster with a replication factor of 3, and a read 
consistency level of QUORUM, 2 of the 3 replicas for the given row are 
contacted to fulfill the read request. Supposing the contacted 
replicas had different versions of the row, the replica with the most 
recent version would return the requested data. In the background, the 
third replica is checked for consistency with the first two, and if 
needed, the most recent replica issues a write to the out-of-date 
replicas.


Always returns the most recent? What if the most recent write is 
corrupt? I thought the whole point of a quorum was that consistency is 
verified /before/ the data is returned to the client. No?


Thanks,

Curt





Re: EC2 Best Practices

2012-04-25 Thread Dave Brosius
0 is a perfectly valid id.node - 1 is modulo the maximum token value. that 
token range is 0  -  2**127so node - 1 in this case is 2**127   - Original 
Message -From: quot;Deno Vichasquot; ;d...@syncopated.net 

Re: Bad Request: No indexed columns present in by-columns clause with equals operator

2012-04-23 Thread Dave Brosius

Works for me on trunk... what version are you using?

On 04/23/2012 08:39 AM, mdione@orange.com wrote:

  I understand the error message, but I don't understand why I get it.
Here's the CF:

cqlsh:avatars  describe columnfamily HBX_FILE;

CREATE COLUMNFAMILY HBX_FILE (
   KEY blob PRIMARY KEY,
   HBX_FIL_DATE text,
   HBX_FIL_LARGE ascii,
   HBX_FIL_MEDIUM ascii,
   HBX_FIL_SMALL ascii,
   HBX_FIL_STATUS text,
   HBX_FIL_TINY ascii
) WITH
   comment='' AND
   comparator=text AND
   read_repair_chance=1.00 AND
   gc_grace_seconds=864000 AND
   default_validation=blob AND
   min_compaction_threshold=4 AND
   max_compaction_threshold=32 AND
   replicate_on_write=True;

CREATE INDEX HBX_FILE_HBX_FIL_STATUS_idx ON HBX_FILE (HBX_FIL_STATUS);

   The query and the error:

cqlsh:avatars  SELECT HBX_FIL_SMALL FROM HBX_FILE WHERE KEY=1 AND 
HBX_FIL_STATUS='actif';
Bad Request: No indexed columns present in by-columns clause with equals 
operator

   A query that works:

cqlsh:avatars  SELECT HBX_FIL_STATUS FROM HBX_FILE WHERE KEY=1;
  HBX_FIL_STATUS

   Actif

Just in case, here's cli's output for the same CF:

[default@avatars] describe HBX_FILE;
 ColumnFamily: HBX_FILE
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period in seconds / keys to save : 0.0/0/all
   Row Cache Provider: 
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
   Key cache size / save period in seconds: 20.0/14400
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Bloom Filter FP chance: default
   Built indexes: []
   Column Metadata:
 Column Name: HBX_FIL_DATE
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: HBX_FIL_LARGE
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_MEDIUM
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_SMALL
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_STATUS
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: HBX_FILE_HBX_FIL_STATUS_idx
   Index Type: KEYS
 Column Name: HBX_FIL_TINY
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

   And the same error, with other words, in the CLI:

[default@avatars] get HBX_FILE where HBX_FIL_STATUS = 'actif';
No indexed columns present in index clause with operator EQ

   Am I missing something? Might as well be that I'm too tired...

--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.






Re: 200TB in Cassandra ?

2012-04-19 Thread Dave Brosius
I think your math is 'relatively' correct. It would seem to me you 
should focus on how you can reduce the amount of storage you are using 
per item, if at all possible, if that node count is prohibitive.


On 04/19/2012 07:12 AM, Franc Carter wrote:


Hi,

One of the projects I am working on is going to need to store about 
200TB of data - generally in manageable binary chunks. However, after 
doing some rough calculations based on rules of thumb I have seen for 
how much storage should be on each node I'm worried.


  200TB with RF=3 is 600TB = 600,000GB
  Which is 1000 nodes at 600GB per node

I'm hoping I've missed something as 1000 nodes is not viable for us.

cheers

--

*Franc Carter* |Systems architect|Sirca Ltd
mailto:marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au mailto:franc.car...@sirca.org.au| 
www.sirca.org.au http://www.sirca.org.au/


Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215






Re: Column Family per User

2012-04-18 Thread Dave Brosius
 Your design should be around how you want to query. If you are only querying 
by user, then having a user as part of the row key makes sense. To manage row 
size, you should think of a row as being a bucket of time. Cassandra supports a 
large (but not without bounds) row size. To manage row size you might say that 
this row is for user fred for the month of april, or if that's too much perhaps 
the row is for user fred for the day 4/18/12. To do this you can use composite 
keys to hold both pieces of information in the key. (user, bucketpos)The nice 
thing is that once the time period has come and gone, that row is complete, and 
you can perform background jobs against that row and store summary information 
for that time period.  - Original Message -From: quot;Trevor 
Francisquot; ;trevor.fran...@tgrahamcapital.com 

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Yes in this cassandra model, time wouldn't be a column value, it would be part 
of the column name. Depending on how you want to access your data (give me all 
data points for time X) and how many separate datapoints you have for time X, 
you might consider packing all the data for a time in one column thru composite 
columnscolumn name: 2012-04-12T12:22:23.293/55/45/10 (where / is a human 
readable representation of the composite separator) in this case there wouldn't 
actually be a value, the data is just encoded in the column name.Obviously if 
you are storing dozens of separate datapoints for a timestamp than this gets 
out of hand quickly, and perhaps you need to go back to column names with 
time/fieldname format with a real value.the advantage tho of the composite key 
is that you eliminate all that constant blather about 'Wind' 'Rain' 'Sunshine' 
in your data and only hold real data. (granted compression will probably help 
here, but not having it all is even better).as for row size, obv
 iously that takes some experimentation on you part. You can bucket a row to be 
any time frame you want. If you feel that 15 minutes is the correct length of 
time given the amount of data you will write, then use 15 minutes. It it's 1 
hour, use 1 hour. The only thing you have to figure out is a 'bucket time' 
definition that you understand, likely it's the timestamp of when that time 
period starts.As for 'rotating the row', perhaps it's just semantics, but there 
really is no such concept. You are at some point in time, and you want to write 
some data to the database.The steps are1) get the user2) get the timestamp of 
the current bucket based on 'now'3) build a composite key4) insert the data 
with that keyWhether that row existed before or is a new row has no bearing on 
your client code.  - Original Message -From: quot;Trevor Francisquot; 
;trevor.fran...@tgrahamcapital.com 

Re: Column Family per User

2012-04-18 Thread Dave Brosius
  Yes in this cassandra model, time wouldn't be a column value, it would be 
part of the column name. Depending on how you want to access your data (give me 
all data points for time X) and how many separate datapoints you have for time 
X, you might consider packing all the data for a time in one column thru 
composite columnscolumn name: 2012-04-12T12:22:23.293/55/45/10 (where / is a 
human readable representation of the composite separator) in this case there 
wouldn't actually be a value, the data is just encoded in the column 
name.Obviously if you are storing dozens of separate datapoints for a timestamp 
than this gets out of hand quickly, and perhaps you need to go back to column 
names with time/fieldname format with a real value.the advantage tho of the 
composite key is that you eliminate all that constant blather about 'Wind' 
'Rain' 'Sunshine' in your data and only hold real data. (granted compression 
will probably help   here, but not having it all is even better).as for row 
size,
  obviously that takes some experimentation on you part. You can bucket a row 
to be any time frame you want. If you feel that 15 minutes is the correct 
length of time given the amount of data you will write, then use 15 minutes. It 
it's 1 hour, use 1 hour. The only thing you have to figure out is a 'bucket 
time' definition that you understand, likely it's the timestamp of when that 
time period starts.As for 'rotating the row', perhaps it's just semantics, but 
there really is no such concept. You are at some point in time, and you want to 
write some data to the database.The steps are1) get the user2) get the 
timestamp of the current bucket based on 'now'3) build a composite key4) insert 
the data with that keyWhether that row existed before or is a new row has no 
bearing on your client code.  - Original Message -From: quot;Trevor 
Francisquot; 
;trevor.fran...@tgrahamcapital.com;trevor.fran...@tgrahamcapital.com

Re: Column Family per User

2012-04-18 Thread Dave Brosius
It seems to me you are on the right track. Finding the right balance of # rows 
vs row width is the part that will take the most experimentation.  - 
Original Message -From: quot;Trevor Francisquot; 
;trevor.fran...@tgrahamcapital.com 

Re: Trying to avoid super columns

2012-04-12 Thread Dave Brosius
If you want to reduce the number of columns, you could pack all the data 
for a product into one column, as in



composite column name- product_id_1:12.44:1.00:3.00



On 04/12/2012 03:03 PM, Philip Shon wrote:
I am currently working on a data model where the purpose is to look up 
multiple products for given days of the year.  Right now, that model 
involves the usage of a super column family. e.g.


2012-04-12: {
  product_id_1: {
price: 12.44,
tax: 1.00,
fees: 3.00,
  },
  product_id_2: {
price: 50.00,
tax: 4.00,
fees: 10.00
  }
}

I should note that for a given day/key, we are expecting in the range 
of 2 million to 4 million products (subcolumns).


With this model, I am able to retrieve any of the products for a given 
day using hector's MultigetSuperSliceQuery.



I am looking into changing this model to use Composite column names. 
How would I go about modeling this? My initial thought is to migrate 
the above model into something more like the following.


2012-04-12: {
  product_id_1:price: 12.44,
  product_id_1:tax: 1.00,
  product_id_1:fees: 3.00,
  product_id_2:price: 50.00,
  product_id_2:tax: 4.00,
  product_id_2:fees: 10.00,
}

The one thing that stands out to me with this approach is the number 
of additonal columns that will be created for a single key. Will the 
increase in columns, create new issues I will need to deal with?


Are there any other thoughts about if I should actually move forward 
(or not) with migration this super column family to the model with the 
component column names?


Thanks,

Phil




Re: Why so many SSTables?

2012-04-11 Thread Dave Brosius
It's easy to spend other people's money, but handling 1TB of data with 
1.5 g heap?  Memory is cheap, and just a little more will solve many 
problems.



On 04/11/2012 08:43 AM, Romain HARDOUIN wrote:


Thank you for your answers.

I originally post this question because we encoutered an OOM Exception 
on 2 nodes during repair session.
Memory analyzing shows an hotspot: an ArrayList of 
SSTableBoundedScanner which contains as many objects there are 
SSTables on disk (7747 objects at the time).

This ArrayList consumes 47% of the heap space (786 MB).

We want each node to handle 1 TB, so we must dramatically reduce the 
number of SSTables.


Thus, is there any drawback if we set sstable_size_in_mb to 200MB?
Otherwise shoudl we go back to Tiered Compaction?

Regards,

Romain


Maki Watanabe watanabe.m...@gmail.com a écrit sur 11/04/2012 04:21:47 :

 You can configure sstable size by sstable_size_in_mb parameter for LCS.
 The default value is 5MB.
 You should better to check you don't have many pending compaction tasks
 with nodetool tpstats and compactionstats also.
 If you have enough IO throughput, you can increase
 compaction_throughput_mb_per_sec
 in cassandra.yaml to reduce pending compactions.

 maki

 2012/4/10 Romain HARDOUIN romain.hardo...@urssaf.fr:
 
  Hi,
 
  We are surprised by the number of files generated by Cassandra.
  Our cluster consists of 9 nodes and each node handles about 35 GB.
  We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
  We have 30 CF.
 
  We've got roughly 45,000 files under the keyspace directory on 
each node:

  ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
  44372
 
  The biggest CF is spread over 38,000 files:
  ls -l Documents* | wc -l
  37870
 
  ls -l Documents*-Data.db | wc -l
  7586
 
  Many SSTable are about 4 MB:
 
  19 MB - 1 SSTable
  12 MB - 2 SSTables
  11 MB - 2 SSTables
  9.2 MB - 1 SSTable
  7.0 MB to 7.9 MB - 6 SSTables
  6.0 MB to 6.4 MB - 6 SSTables
  5.0 MB to 5.4 MB - 4 SSTables
  4.0 MB to 4.7 MB - 7139 SSTables
  3.0 MB to 3.9 MB - 258 SSTables
  2.0 MB to 2.9 MB - 35 SSTables
  1.0 MB to 1.9 MB - 13 SSTables
  87 KB to  994 KB - 87 SSTables
  0 KB - 32 SSTables
 
  FYI here is CF information:
 
  ColumnFamily: Documents
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: 
org.apache.cassandra.db.marshal.BytesType

Columns sorted by: org.apache.cassandra.db.marshal.BytesType
Row cache size / save period in seconds / keys to save : 0.0/0/all
Row Cache Provider: 
org.apache.cassandra.cache.SerializingCacheProvider

Key cache size / save period in seconds: 20.0/14400
GC grace seconds: 1728000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Column Metadata:
  Column Name: refUUID (7265664944)
Validation Class: org.apache.cassandra.db.marshal.BytesType
Index Name: refUUID_idx
Index Type: KEYS
Compaction Strategy:
  org.apache.cassandra.db.compaction.LeveledCompactionStrategy
Compression Options:
  sstable_compression: 
org.apache.cassandra.io.compress.SnappyCompressor

 
  Is it a bug? If not, how can we tune Cassandra to avoid this?
 
  Regards,
 
  Romain




Re: Using Thrift

2012-04-02 Thread Dave Brosius

For a thrift client, you need the following jars at a minimum

apache-cassandra-clientutil-*.jar
apache-cassandra-thrift-*.jar
libthrift-*.jar
slf4j-api-*.jar
slf4j-log4j12-*.jar

all of these jars can be found in the cassandra distribution.



On 04/02/2012 07:40 AM, Rishabh Agrawal wrote:


Any suggestions

*From:*Rishabh Agrawal
*Sent:* Monday, April 02, 2012 4:42 PM
*To:* user@cassandra.apache.org
*Subject:* Using Thrift

Hello,

I have just started exploring Cassandra from java side and using wish 
to use thrift as my api. The problem is whenever is I try to compile 
my java code I get following error :


package org.slf4j does not exist

Can anyone help me with this.

Thanks and Regards

Rishabh Agrawal




Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. 
Know more about our Big Data quick-start program at the event.


New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis 
On-premise' available at http://bit.ly/z6zT4L.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.





Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. 
Know more about our Big Data quick-start program at the event.


New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis 
On-premise' available at http://bit.ly/z6zT4L.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.




Re: Using Thrift

2012-04-02 Thread Dave Brosius
slf4j is just a logging facade, if you actually want log files, you need 
a logger, say log4j-*.jar in your classpath. Then just configure that 
with a log4j.properties file. That properties file also needs to be on 
the classpath.




On 04/02/2012 09:05 AM, Rishabh Agrawal wrote:


I didn't fine slf4j files in distribution. So I downloaded them can 
you help me how to configure it.


*From:*Dave Brosius [mailto:dbros...@mebigfatguy.com]
*Sent:* Monday, April 02, 2012 6:28 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Using Thrift

For a thrift client, you need the following jars at a minimum

apache-cassandra-clientutil-*.jar
apache-cassandra-thrift-*.jar
libthrift-*.jar
slf4j-api-*.jar
slf4j-log4j12-*.jar

all of these jars can be found in the cassandra distribution.



On 04/02/2012 07:40 AM, Rishabh Agrawal wrote:

Any suggestions

*From:*Rishabh Agrawal
*Sent:* Monday, April 02, 2012 4:42 PM
*To:* user@cassandra.apache.org mailto:user@cassandra.apache.org
*Subject:* Using Thrift

Hello,

I have just started exploring Cassandra from java side and using wish 
to use thrift as my api. The problem is whenever is I try to compile 
my java code I get following error :


package org.slf4j does not exist

Can anyone help me with this.

Thanks and Regards

Rishabh Agrawal




Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. 
Know more about our Big Data quick-start program at the event.


New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis 
On-premise' available at http://bit.ly/z6zT4L.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.





Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. 
Know more about our Big Data quick-start program at the event.


New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis 
On-premise' available at http://bit.ly/z6zT4L.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.





Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. 
Know more about our Big Data quick-start program at the event.


New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis 
On-premise' available at http://bit.ly/z6zT4L.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.




Re: counter column family

2012-03-27 Thread Dave Brosius

Counter columns are special, they must be in a column family to themselves.

On 03/27/2012 09:32 AM, puneet loya wrote:
wen i m using a counter column.. i m nt able to add columns of other 
type to the column family.. is it so or it is just synactical error??


[default@CMDCv99] create column family status
... with comparator = AsciiType
... and column_metadata =
... [{
... column_name : Test,
... validation_class : IntegerType,
... index_type : 0,
... index_name : IdxName},
... {
... column_name : 'other name',
... validation_class : CounterColumnType
... }];
Cannot add a counter column (other name) in a non counter column family

On Tue, Mar 27, 2012 at 6:55 PM, R. Verlangen ro...@us2.nl 
mailto:ro...@us2.nl wrote:


You should use a connection pool without retries to prevent a
single increment of +1 have a result of e.g. +3.


2012/3/27 Rishabh Agrawal rishabh.agra...@impetus.co.in
mailto:rishabh.agra...@impetus.co.in

You can even define how much increment you want. But let me
just warn you, as far my knowledge, it has consistency issues.

*From:*puneet loya [mailto:puneetl...@gmail.com
mailto:puneetl...@gmail.com]
*Sent:* Tuesday, March 27, 2012 5:59 PM


*To:* user@cassandra.apache.org mailto:user@cassandra.apache.org
*Subject:* Re: counter column family

thanxx a ton :) :)

the counter column family works synonymous as 'auto increment'
in other databases rite?

I mean we have a column of  type integer which increments with
every insert.

Am i goin the rite way??

please reply :)

On Tue, Mar 27, 2012 at 5:50 PM, R. Verlangen ro...@us2.nl
mailto:ro...@us2.nl wrote:

*create column family MyCounterColumnFamily with
default_validation_class=CounterColumnType and
key_validation_class=UTF8Type and comparator=UTF8Type;*

There you go! Keys must be utf8, as well as the column names.
Of course you can change those validators.

Cheers!

2012/3/27 puneet loya puneetl...@gmail.com
mailto:puneetl...@gmail.com

Can u give an example of create column family with counter
column in it.

Please reply

Regards,

Puneet Loya



-- 
With kind regards,


Robin Verlangen

www.robinverlangen.nl http://www.robinverlangen.nl




Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar
21-22. Know more about our Big Data quick-start program at the
event.

New Impetus webcast ‘Cloud-enabled Performance Testing
vis-à-vis On-premise’ available at http://bit.ly/z6zT4L.


NOTE: This message may contain information that is
confidential, proprietary, privileged or otherwise protected
by law. The message is intended solely for the named
addressee. If received in error, please destroy and notify the
sender. Any use of this email is prohibited when received in
error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained
nor that the communication is free of errors, virus,
interception or interference.




-- 
With kind regards,


Robin Verlangen
www.robinverlangen.nl http://www.robinverlangen.nl






Re: Issue with cassandra-cli assume

2012-03-23 Thread Dave Brosius

I think you want

assume UserDetails validator as bytes;



On 03/23/2012 08:09 PM, Drew Kutcharian wrote:

Hi Everyone,

I'm having an issue with cassandra-cli's assume command with a custom type. I 
tried it with the built-in BytesType and got the same error:

[default@test] assume UserDetails validator as 
org.apache.cassandra.db.marshal.BytesType;
Syntax error at position 35: missing EOF at '.'

I also tried it with single and double quotes with no success:
[default@test] assume UserDetails validator as 
'org.apache.cassandra.db.marshal.BytesType';
Syntax error at position 32: mismatched input 
''org.apache.cassandra.db.marshal.BytesType'' expecting Identifier

Is this a bug?

I'm using Cassanda 1.0.7 on Mac OSX Lion.

Thanks,

Drew






Re: Order rows numerically

2012-03-16 Thread Dave Brosius
if your keys are 1-n and you are using BOP, then almost certainly your 
ring will be massively unbalanced with the first node getting clobbered. 
You'll have bigger issues than getting lexical ordering.


I'd try to rethink your design so that you don't need BOP.

On 03/16/2012 06:49 PM, Watanabe Maki wrote:

How about to fill zeros before smaller digits?
Ex. 0001, 0002, etc

maki


On 2012/03/17, at 6:29, A Js5a...@gmail.com  wrote:


If I define my rowkeys to be Integer
(key_validation_class=IntegerType) , how can I order the rows
numerically ?
ByteOrderedPartitioner orders lexically and retrieval using get_range
does not seem to make sense in order.

If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type),
BOP still does not give numerical enough.
For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering).

Any workaround for this ?

Thanks.




Re: Why is row lookup much faster than column lookup

2012-03-13 Thread Dave Brosius
Given the hashtable nature of cassandra, finding a row is probably 'relatively' 
constant no matter how many columns you have.The smaller the number of columns, 
i suppose the more likely that all the columns will be in one sstable. If 
you've got a ton of columns per row, it is much more likely that these columns 
will be spread out in multple ss tables. Plus, columns are read in chunks, 
depending on yaml settings.   - Original Message -From: quot;A Jquot; 
;s5a...@gmail.com

Re: Why is row lookup much faster than column lookup

2012-03-13 Thread Dave Brosius
 sorry, should have been: Given the hashtable nature of cassandra, finding a 
row is probably 'relatively' constant no matter how many *rows* you have.  
- Original Message -From: quot;Dave Brosiusquot; 
;dbros...@mebigfatguy.com 

Re: key sorting question

2012-03-06 Thread Dave Brosius

  
  
With random partitioner, the rows are sorted by the hashes of the
keys, so for all intents and purposes, not sorted.

This comment below really is talking about how columns are sorted,
and yes when time uuids are used, they are sorted by the time
component, as a time uuids start with the time component and then
adds various randomness bits.

On 03/07/2012 01:51 AM, Tamar Fraenkel wrote:

  
Hi!
I am currently experimenting withCassandra1.0.7, but
  while readinghttp://www.datastax.com/dev/blog/schema-in-cassandra-1-1
  somethingcaughtmy eye:
"Cassandra
ordersversion
1 UUIDsby
their time component"
Is this true?
If
  I have for example USER_CF where key is randomly generated
  java.util.UUID (UUID.randomUUID()),
  will the rows be sorted by the generation time?
I use random partitioner if that
  makes any difference.
Thanks,





  
Tamar Fraenkel
  Senior Software Engineer, TOK Media
  
  

  ta...@tok-media.com
  Tel:+972
2 6409736
  Mob:+972
54 8356490
  Fax:+972
2 5612956
  
  


  
  

  


  



Re: TimeUUID

2012-02-28 Thread Dave Brosius

  
  
Given that these rows are wanted to be time buckets, you would want
collisions, in fact that would be the standard way of working, so
IMO, the uuid just removes the ability to bucket data and would not
be wanted.



On 02/28/2012 10:30 AM, Paul Loy wrote:
In a multi server env, to avoid key collisions
  timeuuid may be the better choice.
  
  On Monday, February 27, 2012, Tamar Fraenkel wrote:
  
Hi!

  I have a column family where I use rows as "time
buckets".
  What I do is take epoc time in seconds, and round it to 1
hour (taking the result of time_since_epoc_second divided by
3600).
  My key validation type is LongType.
  I wonder whether it is better to use TimeUUID or even
readable string representation for time?
  Thanks,
  


-- 

  
Tamar Fraenkel
Senior Software Engineer, TOK Media


  
ta...@tok-media.com
Tel:+972 2 6409736
Mob:+972 54 8356490
Fax:+972 2 5612956


  
  


  

  
  
  
  
  -- 
  Sent from my iPhone, sorry for my brevity.


  



Re: Using cassandra at minimal expenditures

2012-02-27 Thread Dave Brosius
I guess the issue with 2 machines and RF=2 is that Consistency level of QUORUM 
is the same as ALL, so you've pretty much have little flexibility with this 
setup, of course this might be fine depending on what you want to do. In 
addition, RF=2 also means that you get no data-storage improvements from being 
distributed. Having said that, i know there are folks who run 2 machine 
clusters.dave   - Original Message -From: quot;Ertio Lewquot; 
;ertio...@gmail.com 

Re: Issue regarding 'describe' keyword in 1.0.7 version.

2012-02-21 Thread Dave Brosius
What it's saying is if you define a KeySpace Foo and under it a 
ColumnFamily called Foo, you won't be able to use describe to describe 
the ColumnFamily named Foo.




On 02/21/2012 07:26 AM, Rishabh Agrawal wrote:


Hello,

I am newbie to Cassandra. Please bear with my lame doubts.

I running Cassandra version on 1.0.7 on Ubuntu. I found following case 
with /describe/:


If there is Keyspace with name 'x' then /describe x /command will give 
desired results. But if there is also a Column Family named 'x' then 
describe will not be able to catch it. But if there is only column 
family 'x' and no keyspace with the same name then /describe x/ 
command will give desired results i.e. it will be able to capture and 
display info regarding 'x' column family.


Kindly help me with that.

Thanks and Regards

Rishabh Agrawal




Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 
'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) 
http://bit.ly/bSMWd7.


Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets  
Smartphones' available at http://bit.ly/yQC1oD.



NOTE: This message may contain information that is confidential, 
proprietary, privileged or otherwise protected by law. The message is 
intended solely for the named addressee. If received in error, please 
destroy and notify the sender. Any use of this email is prohibited 
when received in error. Impetus does not represent, warrant and/or 
guarantee, that the integrity of this communication has been 
maintained nor that the communication is free of errors, virus, 
interception or interference.




Re: problem with sliceQuery with composite column

2012-02-13 Thread Dave Brosius
if the composite column was rearranged as ticks:111wouldn't the result be as 
desired?   - Original Message -From: quot;aaron mortonquot; 
;aa...@thelastpickle.com 

Re: How to find a commit for specific release on git?

2012-02-12 Thread Dave Brosius
Based on the tags listed here: 
http://git-wip-us.apache.org/repos/asf?p=cassandra.git


I would look here

http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=9d4c0d9a37c7d77a05607b85611c3abdaf75be94


On 02/12/2012 10:39 PM, Maki Watanabe wrote:

Hello,

How to find the right commit SHA for specific cassandra release?
For example, how to checkout 0.8.9 release on git repository?
With git log --grep=0.8.9, I found the latest commit mentioned about 0.8.9 was
---
commit 1f92277c4bf9f5f71303ecc5592e27603bc9dec1
Author: Sylvain Lebresneslebre...@apache.org
Date:   Sun Dec 11 00:02:14 2011 +

 prepare for release 0.8.9

 git-svn-id:
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@1212938
13f79535-47bb-0310-9956-ffa450edef68
---

However I don't think it's a reliable way.  I've also checked
CHANGES.txt and NEW.txt but thoese say nothing on commit SHA.

regards,




Re: Unsubscribe

2012-02-12 Thread Dave Brosius

On 02/12/2012 10:53 PM, Shubham Srivastava wrote:

--
Sent using BlackBerry




send an email to user-unsubscr...@cassandra.apache.org


Re: Unsubscribe

2012-02-04 Thread Dave Brosius

On 02/04/2012 12:05 PM, Andrea Loggia wrote:
Unsubscribe 



If you wish to unsubscribe from the cassandra user list send a blank 
email here




user-unsubscr...@cassandra.apache.org 
mailto://user-subscr...@cassandra.apache.org


Re: unsubscribe

2012-01-24 Thread Dave Brosius
Folks who wish to unsubscribe should sent a blank email to the following 
address


user-unsubscr...@cassandra.apache.org 
mailto:user-unsubscr...@cassandra.apache.org







Re: Problems Starting Cassandra Server -

2012-01-17 Thread Dave Brosius

Change your yaml entry for data_file_directories from

data_file_directories: F:\cassandra\data

to


data_file_directories:
- F:\cassandra\data


On 01/17/2012 11:54 PM, Asha Subramanian wrote:


Here is the yaml file..

Thanks

*From:*Dave Brosius [mailto:dbros...@mebigfatguy.com]
*Sent:* Wednesday, January 18, 2012 9:07 AM
*To:* user@cassandra.apache.org; Asha Subramanian
*Subject:* Re: Problems Starting Cassandra Server -

It probably would be useful to know what your yaml file looks like.

On 01/17/2012 08:58 PM, Asha Subramanian wrote:

I am a new user of Cassandra and want to understand the basics of 
Cassandra before moving to cluster installations etc..


I picked up the latest version of Cassandra from the home page 1.0.7 
released on 2012/01/16. I am installing on Windows 7


I have followed all the instx for changing the Cassandra.yaml and also 
the environment variables.. However when I start the server, I get the 
following error --


What could be the problem ???

F:\cassandra\bincassandra.bat

Starting Cassandra Server

INFO 07:22:37,766 Logging initialized

INFO 07:22:37,828 JVM vendor/version: Java HotSpot(TM) Client VM/1.6.0_30

INFO 07:22:37,844 Heap size: 1070399488/1070399488

INFO 07:22:37,844 Classpath: 
F:\cassandra\conf;F:\cassandra\lib\antlr-3.2.jar;F


:\cassandra\lib\apache-cassandra-1.0.6.jar;F:\cassandra\lib\apache-cassandra-cli

entutil-1.0.6.jar;F:\cassandra\lib\apache-cassandra-thrift-1.0.6.jar;F:\cassandr

a\lib\avro-1.4.0-fixes.jar;F:\cassandra\lib\avro-1.4.0-sources-fixes.jar;F:\cass

andra\lib\commons-cli-1.1.jar;F:\cassandra\lib\commons-codec-1.2.jar;F:\cassandr

a\lib\commons-lang-2.4.jar;F:\cassandra\lib\compress-lzf-0.8.4.jar;F:\cassandra\

lib\concurrentlinkedhashmap-lru-1.2.jar;F:\cassandra\lib\guava-r08.jar;F:\cassan

dra\lib\high-scale-lib-1.1.2.jar;F:\cassandra\lib\jackson-core-asl-1.4.0.jar;F:\

cassandra\lib\jackson-mapper-asl-1.4.0.jar;F:\cassandra\lib\jamm-0.2.5.jar;F:\ca

ssandra\lib\jline-0.9.94.jar;F:\cassandra\lib\json-simple-1.1.jar;F:\cassandra\l

ib\libthrift-0.6.jar;F:\cassandra\lib\log4j-1.2.16.jar;F:\cassandra\lib\servlet-

api-2.5-20081211.jar;F:\cassandra\lib\slf4j-api-1.6.1.jar;F:\cassandra\lib\slf4j

-log4j12-1.6.1.jar;F:\cassandra\lib\snakeyaml-1.6.jar;F:\cassandra\lib\snappy-ja

va-1.0.4.1.jar;F:\cassandra\build\classes\main;F:\cassandra\build\classes\thrift

;F:\cassandra\lib\jamm-0.2.5.jar

INFO 07:22:37,859 JNA not found. Native methods will be disabled.

INFO 07:22:37,891 Loading settings from 
file:/F:/cassandra/conf/cassandra.yaml 
file:///F:%5Ccassandra%5Cconf%5Ccassandra.yaml


ERROR 07:22:38,234 Fatal configuration error error

Can't construct a java object for 
tag:yaml.org,2002:org.apache.cassandra.config.


Config; exception=Cannot create property=data_file_directories for 
JavaBean=org.


apache.cassandra.config.Config@1329642; No single argument constructor 
found for


class [Ljava.lang.String;

in reader, line 10, column 1:

cluster_name: 'Test Cluster'

^

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.constr


uct(Constructor.java:372)

at 
org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseCo


nstructor.java:177)

at 
org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(Base


Constructor.java:136)

at 
org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseCons


tructor.java:122)

at org.yaml.snakeyaml.Loader.load(Loader.java:52)

at org.yaml.snakeyaml.Yaml.load(Yaml.java:166)

at 
org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescr


iptor.java:133)

at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCa


ssandraDaemon.java:125)

at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(Abstrac


tCassandraDaemon.java:337)

at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java


:107)

Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create 
property=data_f


ile_directories for 
JavaBean=org.apache.cassandra.config.Config@1329642; No sing


le argument constructor found for class [Ljava.lang.String;

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct


JavaBean2ndStep(Constructor.java:305)

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct


(Constructor.java:184)

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.constr


uct(Constructor.java:370)

... 9 more

Caused by: org.yaml.snakeyaml.error.YAMLException: No single argument 
constructo


r found for class [Ljava.lang.String;

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.construct(


Constructor.java:419)

at 
org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseCo


nstructor.java:177)

at 
org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct


JavaBean2ndStep

Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Dave Brosius

This works for me

http://wiki.apache.org/cassandra/HowToDebug



On 01/06/2012 01:18 AM, Kuldeep Sengar wrote:

Hi,
Can you post the error(saying that only 1 error is there), that'll make things 
more clear.
Thanks

Kuldeep Singh Sengar

Opera Solutions
Tech Boulevard,8th floor, Tower C,
Sector 127, Plot No 6,Noida 201 301
+91 (120) 4642424 facsimile, Ext : 2418
+91 8800595878 (M)

-Original Message-
From: Maki Watanabe [mailto:watanabe.m...@gmail.com]
Sent: Friday, January 06, 2012 7:30 AM
To: user@cassandra.apache.org
Subject: Re: Integration Error between Cassandra and Eclipse

Sorry, ignore my reply.
I had same result with import. ( 1 error in unit test code  many warnings )

2012/1/6 Maki Watanabewatanabe.m...@gmail.com:

How about to use File-Import... rather than File-New Java Project?

After extracting the source, ant build, and ant generate-eclipse-files:
1. File-Import...
2. Choose Existing Project into workspace...
3. Choose your source directory as root directory and then push Finish


2012/1/6 bobby saputrazaibat...@gmail.com:

Hi There,

I am a beginner user in Cassandra. I hear from many people said Cassandra is
a powerful database software which is used by Facebook, Twitter, Digg, etc.
So I feel interesting to study more about Cassandra.

When I performed integration process between Cassandra with Eclipse IDE (in
this case I use Java as computer language), I get trouble and have many
problem.
I have already followed all instruction from
http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
tutorial was not working properly. I got a lot of errors and warnings while
creating Java project in eclipse.

These are the errors and warnings:

Error(X) (1 item):
Description Resource  Location
The method rangeSet(RangeT...) in the type Range is not applicable for the
arguments (Range[]) RangeTest.java line 178

Warnings(!) (100 of 2916 items):
Description Resource Location
AbstractType is a raw type. References to generic type AbstractTypeT
should be parameterized AbstractColumnContainer.java line 72
(and many same warnings)

These are what i've done:
1. I checked out cassandra-trunk from given link using SlikSvn as svn
client.
2. I moved to cassandra-trunk folder, and build with ant using ant build
command.
3. I generate eclipse files with ant using ant generate-eclipse-files
command.
4. I create new java project on eclipse, insert project name with
cassandra-trunk, browse the location into cassandra-trunk folder.

Do I perform any mistakes? Or there are something wrong with the tutorial in
http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

I have already googling to find the solution to solve this problem, but
unfortunately
I found no results. Would you want to help me by giving me a guide how to
solve
this problem? Please

Thank you very much for your help.

Best Regards,
Wira Saputra



--
w3m







Re: java thrift error

2011-12-20 Thread Dave Brosius
A ByteBuffer is not a byte[] to convert a String to a ByteBuffer do something 
likepublic static ByteBuffer toByteBuffer(String value) 
throws UnsupportedEncodingException
{
return ByteBuffer.wrap(value.getBytes(quot;UTF-8quot;));
} see http://wiki.apache.org/cassandra/ThriftExamples - Original 
Message -From: quot;A Jquot; ;s5a...@gmail.com

Re: setStrategy_options syntax in thrift

2011-12-20 Thread Dave Brosius
 KsDef ksDef = new KsDef();Map;String, String;String, String

Re: memory estimate for each key in the key cache

2011-12-16 Thread Dave Brosius

On 12/16/2011 10:13 PM, Brandon Williams wrote:

On Fri, Dec 16, 2011 at 8:52 PM, Kent Tongfreemant2...@yahoo.com  wrote:

Hi,

 From the source code I can see that for each key, the hash (token), the key 
itself (ByteBuffer) and the position (long. offset in the sstable) are stored into 
the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total size 
required is at least 16+size-of(key)+4 which is  20 bytes. If we consider the 
overhead of the object references, then it will be even larger. Then, why the wiki 
recommends multiplying the  number of keys cached with 10-12 to get the memory 
requirement?

In a word: java.

-Brandon



Wow, Java is a lot better than I thought if it can perform that kind of 
magic.  I'm guessing the wiki information is just old and out of date. 
It's probably more like 60 + sizeof(key)


  1   2   >