Re: about write performance

2017-12-07 Thread Oleksandr Shulgin
On Fri, Dec 8, 2017 at 3:05 AM, Eunsu Kim  wrote:

> There is a table with a timestamp as a cluster key and sorted by ASC for
> the column.
>
> Is it better to insert by the time order when inserting data into this
> table for insertion performance? Or does it matter?
>

The writes hit memory tables first, so from this perspective it shouldn't
matter.

Later the memory tables are sorted according to the partition and
clustering key and are flushed to disk in this order, forming the SSTable
files.  The difference in performance you might experience upon reading the
data, depending on compaction strategy you choose.  For time-series data
with TTL there is good chance that TimeWindowCompactionStrategy is
appropriate, given you mostly write with approx. monotonically increasing
timestamps.  This helps organizing the data files for faster reads and
really cheap removal of expired data: the whole file can be just dropped by
compaction process once all records in it expire.

Regards,
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707


?????? run Cassandra on physical machine

2017-12-07 Thread Peng Xiao
Thanks All.




--  --
??: "Jeff Jirsa";;
: 2017??12??8??(??) 3:19
??: "cassandra";

: Re: run Cassandra on physical machine



Which is to say, right now you can't run them on different ports, but you can 
run them on different IPs on the same machine (and different IPs dont need 
different physical NICs, you can bind multiple IPs to a given physical NIC).


On Thu, Dec 7, 2017 at 10:54 AM, Dikang Gu  wrote:
@Peng, how many network interfaces do you have on your machine? If you just 
have one NIC, you probably need to wait this storage port patch. 
https://issues.apache.org/jira/browse/CASSANDRA-7544 .

On Thu, Dec 7, 2017 at 7:01 AM, Oliver Ruebenacker  wrote:


 Hello,


  Yes, you can.


 Best, Oliver


On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote:
Dear All,


Can we run Cassandra on physical machine directly?
we all know that vm can reduce the performance.For instance,we have a machine 
with 56 core,8 ssd disks.
Can we run 8 cassandra instance in the same machine within one rack with 
different port?


Could anyone please advise?


Thanks,
Peng Xiao






-- 
Oliver Ruebenacker

Senior Software Engineer, Diabetes Portal, Broad Institute










 
 






-- 
Dikang

Re: Running repair while Cassandra upgrade 2.0.X to 2.1.X

2017-12-07 Thread shini gupta
Hi
Can someone please answer this query?

Thanks

On Wed, Dec 6, 2017 at 9:58 AM, shini gupta  wrote:

> If we have upgraded Cassandra binaries from 2.0 to 2.1 on ALL the nodes
> but upgradesstable is still pending, please provide the impact of following
> scenarios:
>
>
>
> 1. Running nodetool repair on one of the nodes while upgradesstables is
> still executing on one or more nodes in the cluster.
>
> 2. Running nodetool repair when upgradesstables failed abruptly on some of
> the nodes such that some sstable files are in new format while other
> sstable files are still in old format.
>
>
>
> Even though it may not be recommended to run I/O intensive operations like
> repair and upgradesstables simultaneously, can we assume that both the
> above sceanrios are now supported and will not break anything, especially
> after https://issues.apache.org/jira/browse/CASSANDRA-5772 has been fixed
> in 2.0?
>
>
> Regards
> Shini
>
>


-- 
-Shini Gupta

""Trusting in God won't make the mountain smaller,
But will make climbing easier.
Do not ask God for a lighter load
But ask Him for a stronger back... ""


Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
You’re going to have duplicate data no matter what you do.  Creating indexes is 
another representation of the data, it’s not free.  

Yes, storing it in two places is more work, but I’ve typically had to do that 
anyways.  My search queries are almost never an exact match to my Cassandra 
data model.

If you want to try to do everything in Cassandra, try the two links I listed.  
I can’t endorse either, as I’ve never used them, but I’ve heard nice things 
about both in passing.

Jon

> On Dec 7, 2017, at 7:05 PM, @Nandan@  wrote:
> 
> Thanks. But again my  questions come back at the same place that how to do 
> data modeling because If we will do denormalized then we have to allow a lot 
> of data duplication, as well as Insert and Update, will also need to think 
> because based on this we have to insert data into multiple tables at same 
> time.
> 
> 
> On Fri, Dec 8, 2017 at 10:54 AM, Jon Haddad  > wrote:
> I mean ES is great as a search engine.  I would use Cassandra as my source of 
> truth, and also index my data in ES.
> 
> I typed my original message before I walked my dog, I should have also 
> pointed out https://github.com/strapdata/elassandra 
>  and 
> https://github.com/Stratio/cassandra-lucene-index 
> , but I haven’t used 
> either one.
> 
> Jon
> 
> 
>> On Dec 7, 2017, at 5:59 PM, @Nandan@ > > wrote:
>> 
>> Hi Jon,
>> Do you mean Elastic search for storing data or Data should be store into 
>> Cassandra and use Elastic Search for Select records from tables. ?
>> 
>> 
>> On Fri, Dec 8, 2017 at 9:50 AM, Jon Haddad > > wrote:
>> 1. No, Apache Cassandra is pretty terrible for search on it’s own.  Even 
>> with SASI.
>> 2. Maybe, but it’s complicated, and doing it right takes a lot of 
>> experience.  I’d use Elastic Search instead.
>> 
>> 
>> 
>> > On Dec 7, 2017, at 5:39 PM, @Nandan@ > > > wrote:
>> >
>> > Hi Peoples,
>> >
>> > As currently around the world 60-70% websites are excelling with 
>> > E-commerce in which we have to store huge amount of data and select 
>> > pattern based on Partial Search, Text match, Full-Text Search and all.
>> >
>> > So below questions comes to mind :
>> > 1) Is Cassandra a correct choice for data modeling which gives complex 
>> > Search patterned as  Amazon or eBay is using?
>> > 2) If we will use denormalized data modeling then is it will be effective?
>> >
>> > Please clarify this.
>> >
>> > Thanks and Best regards,
>> > Nandan Priyadarshi
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> 
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> 
>> 
>> 
> 
> 



Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread @Nandan@
Thanks. But again my  questions come back at the same place that how to do
data modeling because If we will do denormalized then we have to allow a
lot of data duplication, as well as Insert and Update, will also need to
think because based on this we have to insert data into multiple tables at
same time.


On Fri, Dec 8, 2017 at 10:54 AM, Jon Haddad  wrote:

> I mean ES is great as a search engine.  I would use Cassandra as my source
> of truth, and also index my data in ES.
>
> I typed my original message before I walked my dog, I should have also
> pointed out https://github.com/strapdata/elassandra and https
> ://github.com/Stratio/cassandra-lucene-index, but I haven’t used either
> one.
>
> Jon
>
>
> On Dec 7, 2017, at 5:59 PM, @Nandan@ 
> wrote:
>
> Hi Jon,
> Do you mean Elastic search for storing data or Data should be store into
> Cassandra and use Elastic Search for Select records from tables. ?
>
>
> On Fri, Dec 8, 2017 at 9:50 AM, Jon Haddad  wrote:
>
>> 1. No, Apache Cassandra is pretty terrible for search on it’s own.  Even
>> with SASI.
>> 2. Maybe, but it’s complicated, and doing it right takes a lot of
>> experience.  I’d use Elastic Search instead.
>>
>>
>>
>> > On Dec 7, 2017, at 5:39 PM, @Nandan@ 
>> wrote:
>> >
>> > Hi Peoples,
>> >
>> > As currently around the world 60-70% websites are excelling with
>> E-commerce in which we have to store huge amount of data and select pattern
>> based on Partial Search, Text match, Full-Text Search and all.
>> >
>> > So below questions comes to mind :
>> > 1) Is Cassandra a correct choice for data modeling which gives complex
>> Search patterned as  Amazon or eBay is using?
>> > 2) If we will use denormalized data modeling then is it will be
>> effective?
>> >
>> > Please clarify this.
>> >
>> > Thanks and Best regards,
>> > Nandan Priyadarshi
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
I mean ES is great as a search engine.  I would use Cassandra as my source of 
truth, and also index my data in ES.

I typed my original message before I walked my dog, I should have also pointed 
out https://github.com/strapdata/elassandra 
 and 
https://github.com/Stratio/cassandra-lucene-index 
, but I haven’t used either 
one.

Jon

> On Dec 7, 2017, at 5:59 PM, @Nandan@  wrote:
> 
> Hi Jon,
> Do you mean Elastic search for storing data or Data should be store into 
> Cassandra and use Elastic Search for Select records from tables. ?
> 
> 
> On Fri, Dec 8, 2017 at 9:50 AM, Jon Haddad  > wrote:
> 1. No, Apache Cassandra is pretty terrible for search on it’s own.  Even with 
> SASI.
> 2. Maybe, but it’s complicated, and doing it right takes a lot of experience. 
>  I’d use Elastic Search instead.
> 
> 
> 
> > On Dec 7, 2017, at 5:39 PM, @Nandan@  > > wrote:
> >
> > Hi Peoples,
> >
> > As currently around the world 60-70% websites are excelling with E-commerce 
> > in which we have to store huge amount of data and select pattern based on 
> > Partial Search, Text match, Full-Text Search and all.
> >
> > So below questions comes to mind :
> > 1) Is Cassandra a correct choice for data modeling which gives complex 
> > Search patterned as  Amazon or eBay is using?
> > 2) If we will use denormalized data modeling then is it will be effective?
> >
> > Please clarify this.
> >
> > Thanks and Best regards,
> > Nandan Priyadarshi
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 



Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread @Nandan@
Hi Jon,
Do you mean Elastic search for storing data or Data should be store into
Cassandra and use Elastic Search for Select records from tables. ?


On Fri, Dec 8, 2017 at 9:50 AM, Jon Haddad  wrote:

> 1. No, Apache Cassandra is pretty terrible for search on it’s own.  Even
> with SASI.
> 2. Maybe, but it’s complicated, and doing it right takes a lot of
> experience.  I’d use Elastic Search instead.
>
>
>
> > On Dec 7, 2017, at 5:39 PM, @Nandan@ 
> wrote:
> >
> > Hi Peoples,
> >
> > As currently around the world 60-70% websites are excelling with
> E-commerce in which we have to store huge amount of data and select pattern
> based on Partial Search, Text match, Full-Text Search and all.
> >
> > So below questions comes to mind :
> > 1) Is Cassandra a correct choice for data modeling which gives complex
> Search patterned as  Amazon or eBay is using?
> > 2) If we will use denormalized data modeling then is it will be
> effective?
> >
> > Please clarify this.
> >
> > Thanks and Best regards,
> > Nandan Priyadarshi
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


about write performance

2017-12-07 Thread Eunsu Kim
There is a table with a timestamp as a cluster key and sorted by ASC for the 
column.

Is it better to insert by the time order when inserting data into this table 
for insertion performance? Or does it matter?

Thank you.

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
1. No, Apache Cassandra is pretty terrible for search on it’s own.  Even with 
SASI.
2. Maybe, but it’s complicated, and doing it right takes a lot of experience.  
I’d use Elastic Search instead. 



> On Dec 7, 2017, at 5:39 PM, @Nandan@  wrote:
> 
> Hi Peoples,
> 
> As currently around the world 60-70% websites are excelling with E-commerce 
> in which we have to store huge amount of data and select pattern based on 
> Partial Search, Text match, Full-Text Search and all. 
> 
> So below questions comes to mind :
> 1) Is Cassandra a correct choice for data modeling which gives complex Search 
> patterned as  Amazon or eBay is using?
> 2) If we will use denormalized data modeling then is it will be effective? 
> 
> Please clarify this. 
> 
> Thanks and Best regards,
> Nandan Priyadarshi


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread @Nandan@
Hi Peoples,

As currently around the world 60-70% websites are excelling with E-commerce
in which we have to store huge amount of data and select pattern based on
Partial Search, Text match, Full-Text Search and all.

So below questions comes to mind :
1) Is Cassandra a correct choice for data modeling which gives complex
Search patterned as  Amazon or eBay is using?
2) If we will use denormalized data modeling then is it will be effective?

Please clarify this.

Thanks and Best regards,
Nandan Priyadarshi


Re: Tombstone warnings in log file

2017-12-07 Thread Alain RODRIGUEZ
Hello Simon.

Tombstone is a tricky topic in Cassandra that brought a lot of questions
over time. I exposed my understanding in a blog post last year and thought
it might be of interest for you, even though things probably evolved a bit,
principles and tuning did not change that much I guess.

Here is the post:
thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html.

2017-12-05 2:01 GMT+00:00 wxn...@zjqunshuo.com :

> Got it. Thank you.
>
>
> *From:* Meg Mara 
> *Date:* 2017-12-05 01:54
> *To:* user@cassandra.apache.org
> *Subject:* RE: Tombstone warnings in log file
>
> Simon,
>
>
>
> It means that in processing your queries, Cassandra is going through that
> many tombstone cells in order to return your results. It is because some of
> the partitions that you are querying for have already expired. The warning
> is just cassandra’s way of letting you know that your reads are less
> efficient because you are reading a lot of expired data.
>
>
>
> You could tune this by altering your tombstone parameters in Cassandra
> yaml file. But a better solution would be to reduce your GC grace seconds
> for that table to a smaller value (as opposed to default of 10 days) so
> that the TTLed data will be purged sooner.
>
>
>
> You could also consider drafting more efficient queries which won’t hit
> TTLed partitions.
>
>
>
> Thanks,
>
> *Meg*
>
>
>
>
>
> *From:* wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com]
> *Sent:* Sunday, December 03, 2017 7:49 PM
> *To:* user 
> *Subject:* Tombstone warnings in log file
>
>
>
> Hi,
>
> My cluster is running 2.2.8, no update and deletion, only insertion with
> TTL.  I saw below warnings reacently. What's the meaning of them and what's
> the impact?
>
>
>
> WARN  [SharedPool-Worker-2] 2017-12-04 09:32:48,833
> SliceQueryFilter.java:308 - Read 2461 live and 1978
> tombstone cells in cargts.eventdata for key: 129762:20171202 (see
> tombstone_warn_threshold). 5000 columns were requested, slices=[-]
>
>
>
> Best regards,
>
> -Simon
>
> WARN  [SharedPool-Worker-2] 2017-12-04 09:32:48,833 SliceQueryFilter.java:308 
> - Read 2461 live and 1978 tombstone cells in cargts.eventdata for 
> key: 129762:20171202 (see tombstone_warn_threshold). 5000 columns were 
> requested, slices=[-]
>
>


RE: ClassNotFoundException when trigger is fired

2017-12-07 Thread tsubasa.nar...@us.fujitsu.com
Hi,

Thanks for your replay.
I put jar file under conf/triggers before starting Cassandra then I could found 
following log
INFO  [OptionalTasks:1] 2017-11-30 03:55:43,541 CustomClassLoader.java:87 - 
Loading new jar /home/tnarita/cassandra/conf/triggers/TestTrigger.jar

It looks like my jar files is added to Cassandra's classloader but when 
Cassandra call my class, sometimes ClassNotFoundException happen.
If I reboot Cassandra , I don't get ClassNotFoundException and cassandara can 
call my class.
So, I suspect function related to loading jar file have bugs.

Thanks,
Tsubasa Narita

From: Jacques-Henri Berthemet [mailto:jacques-henri.berthe...@genesys.com]
Sent: Wednesday, December 6, 2017 11:56 PM
To: user@cassandra.apache.org
Subject: RE: ClassNotFoundException when trigger is fired

Hi,

I have a custom secondary index that works well with Cassandra, I put the jar 
file in Cassandra's lib folder before starting Cassandra, maybe you can try to 
do the same thing?

I don't think that Cassandra's class loader is dynamic, you need to have your 
jars in the classpath before starting Cassandra.

Regards,
--
Jacques-Henri Berthemet

From: tsubasa.nar...@us.fujitsu.com 
[mailto:tsubasa.nar...@us.fujitsu.com]
Sent: mercredi 6 décembre 2017 19:49
To: user@cassandra.apache.org
Subject: ClassNotFoundException when trigger is fired

Dear All

I use cassandra trigger to detect data change in DB and usually it works.
But sometime I get ClassNotFoundException when trigger is fired.

Following is what I did
1. create class which implement ITrigger interface. ex)class name is 
TestTrigger.java
2. create jar file and put it under conf/triggers ex)jar file name is 
TestTrigger.jar
3. start cassandra
4. I can find following log. Looks like jar file is loaded successfully
INFO  [OptionalTasks:1] 2017-11-30 03:55:43,541 CustomClassLoader.java:87 - 
Loading new jar /home/tnarita/cassandra/conf/triggers/TestTrigger.jar
5. login cql and create trigger for test table.
6. insert value into test table
7. trigger is fired.
8. I got ClassNotFoundException. following is the log

java.lang.RuntimeException: Exception while executing trigger on table with ID: 
1cb6a5a0-cb00-11e7-a737-49047aea57a8
at 
org.apache.cassandra.triggers.TriggerExecutor.executeInternal(TriggerExecutor.java:241)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.triggers.TriggerExecutor.execute(TriggerExecutor.java:119) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:823)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:431)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:417)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
 [apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
 [apache-cassandra-3.9.jar:3.9]
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 [apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
[apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.ClassNotFoundException: com.test.TestTrigger
at 

Re: run Cassandra on physical machine

2017-12-07 Thread Jeff Jirsa
Which is to say, right now you can't run them on different ports, but you
can run them on different IPs on the same machine (and different IPs dont
need different physical NICs, you can bind multiple IPs to a given physical
NIC).


On Thu, Dec 7, 2017 at 10:54 AM, Dikang Gu  wrote:

> @Peng, how many network interfaces do you have on your machine? If you
> just have one NIC, you probably need to wait this storage port patch.
> https://issues.apache.org/jira/browse/CASSANDRA-7544 .
>
> On Thu, Dec 7, 2017 at 7:01 AM, Oliver Ruebenacker 
> wrote:
>
>>
>>  Hello,
>>
>>   Yes, you can.
>>
>>  Best, Oliver
>>
>> On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote:
>>
>>> Dear All,
>>>
>>> Can we run Cassandra on physical machine directly?
>>> we all know that vm can reduce the performance.For instance,we have a
>>> machine with 56 core,8 ssd disks.
>>> Can we run 8 cassandra instance in the same machine within one rack with
>>> different port?
>>>
>>> Could anyone please advise?
>>>
>>> Thanks,
>>> Peng Xiao
>>>
>>
>>
>>
>> --
>> Oliver Ruebenacker
>> Senior Software Engineer, Diabetes Portal
>> , Broad Institute
>> 
>>
>>
>
>
> --
> Dikang
>
>


Re: Some problems with Apache Cassandra

2017-12-07 Thread Jeff Jirsa
Reposting to user@ from dev@


On Thu, Dec 7, 2017 at 4:27 AM, v.elis...@rubic.pro 
wrote:

> Hello, my name is Vladimir and I have questions on Cassandra DMBS.
>
>
> *1 PROBLEM. How to increase the speed?*
> I installed this system from the repository:
> "deb http://www.apache.org/dist/cassandra/debian 311x main"
> With standard settings, the system runs slowly.
> _Comparison with "MySQL"._
> select count (*) from login_wifly.radacct;
> + -- +
> | | count (*) |
> + -- +
> | | 9810806 |
> + -- +
>
> real 0m3.709s - speed in the MySQL (without caching)
> user 0m0.000s
> sys 0m0.000s
>
> _In Cassandra:_
> SELECT count(*) FROM test.radacct;
>
>  count
> -
>  9810806
>
> (1 rows)
>
> Warnings :
> Aggregation query used without partition key
>
>
> real3m7.661s  speed in the Cassandra
> user0m0.444s
> sys 0m0.056s
>
>
This is an antipattern in cassandra. It works, but it will never be
efficient. We make no effort to try to optimize for full table scans.

Depending on your accuracy requirements, you could use the metrics we
expose to approximate the count.



>
> *2 PROBLEM. How to design a table correctly?*
> My test table configure:
> create table radacct2 (
> radacctid bigint,
> acctsessionid text,
> acctuniqueid text,
> username text,
> groupname text,
> realm text,
> nasid text,
> nasipaddress text,
> nasportid text,
> nasporttype text,
> acctstarttime text,
> acctstoptime text,
> acctsessiontime bigint,
> acctauthentic text,
> connectinfo_start text,
> connectinfo_stop text,
> acctinputoctets bigint,
> "acctoutputoctets" bigint,
> "calledstationid" text,
> callingstationid text,
> acctterminatecause text,
> servicetype text,
> framedprotocol text,
> framedipaddress text,
> acctstartdelay bigint,
> acctstopdelay bigint,
> xascendsessionsvrkey text,
> client bigint,
> method text,
> zone bigint,
> localDateStart text,
> localDateStop text,
> localDateTimeStart text,
> localDateTimeStop text,
> msisdn text,
> PRIMARY KEY (radacctid, username)
> ) WITH CLUSTERING ORDER BY (username DESC);
>
> I need do select to the "username" field.
> In MySQL it looks like this:
> SELECT
> a.username,
> COUNT (DISTINCT a.username)
> FROM
> radacct as a
> WHERE
> (LENGTH (a.username) = 17)
> GROUP BY
> a.username;
>
> When I execute the query "SELECT username, count (*) FROM radacct GROUP BY
> username;
> InvalidRequest: Error from server: code = 2200 [Invalid query] message =
> "PRIMARY KEY, got username"
>
>
Cassandra isn't an RDBMS. You have to model the data differently. You
typically build tables based on the SELECT query/queries you'll be using.
You can't just swap in your MySQL table and queries and expect this to work.

It's typical to have multiple denormalized tables, so if you want to do a
lookup by username, you'll build a table with username as the partition key.
If you need to also do a lookup by radacctid, you'll build another table
with radacctid as the partition key.
Since cassandra doesn't have JOINs, you'll probably need to do one query by
username to get the radacctid, and then a second to actually query the main
radacct2 table.

In short: before trying to just make this work, spend some time reading or
watching videos about how to properly data model in cassandra. It's not
MySQL. It's different.


> *3 PROBLEM. How to improve performance using a competent configuration?*
>
> cat /etc/cassandra/cassandra.yaml
> What parameters can be adjusted to achieve maximum effect?
>

All of them?


Re: run Cassandra on physical machine

2017-12-07 Thread Dikang Gu
@Peng, how many network interfaces do you have on your machine? If you just
have one NIC, you probably need to wait this storage port patch.
https://issues.apache.org/jira/browse/CASSANDRA-7544 .

On Thu, Dec 7, 2017 at 7:01 AM, Oliver Ruebenacker  wrote:

>
>  Hello,
>
>   Yes, you can.
>
>  Best, Oliver
>
> On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote:
>
>> Dear All,
>>
>> Can we run Cassandra on physical machine directly?
>> we all know that vm can reduce the performance.For instance,we have a
>> machine with 56 core,8 ssd disks.
>> Can we run 8 cassandra instance in the same machine within one rack with
>> different port?
>>
>> Could anyone please advise?
>>
>> Thanks,
>> Peng Xiao
>>
>
>
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> , Broad Institute
> 
>
>


-- 
Dikang


RE: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-07 Thread Fd Habash
Thank you.

How do I identify what other 2 nodes the former downed node replicated with? A 
replica set of 3 nodes A,B,C. Now, C has been terminated by AWS and is gone. 
Using the getendpoints assumes knowing a partition key value, but how do you 
even know what key to use?

If there is a way to identify A and B, I, then, can simply run ‘nodetool 
repair’ to repair ALL the ranges on either.

Thanks 


Thank you

From: kurt greaves
Sent: Wednesday, December 6, 2017 6:45 PM
To: User
Subject: Re: When Replacing a Node, How to Force a Consistent Bootstrap

That's also an option but it's better to repair before and after if possible, 
if you don't repair beforehand you could end up missing some replicas until you 
repair after replacement, which could cause queries to return old/no data. 
Alternatively you could use ALL after replacing until the repair completes.

For example, A and C have replica a, A dies, on replace A streams the partition 
owning a from B, and thus is still inconsistent. QUORUM query hits A and B, and 
no results are returned for a.

On 5 December 2017 at 23:04, Fred Habash  wrote:
Or, do a full repair after bootstrapping completes?



On Dec 5, 2017 4:43 PM, "Jeff Jirsa"  wrote:
You cant ask cassandra to stream from the node with the "most recent data", 
because for some rows B may be most recent, and for others C may be most recent 
- you'd have to stream from both (which we don't support).

You'll need to repair (and you can repair before you do the replace to avoid 
the window of time where you violate consistency - use the -hosts option to 
allow repair with a down host, you'll repair A+C, so when B starts it'll 
definitely have all of the data).


On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash  wrote:
Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node 
A and B. Before it was written to C, node B crashes. I replaced B and it 
bootstrapped data from node C.
 
Now, row x is missing from C and B.  If node A crashes, it will be replaced and 
it will bootstrap from either C or B. As such, row x is now completely gone 
from the entire ring. 
 
Is this scenario possible at all (at least in C* < 3.0). 
 
How can a newly replaced node be forced to bootstrap from the node in the 
replica set that has the most recent data? 
 
Otherwise, we have to repair a node immediately after bootstrapping it for a 
node replacement.
 
Thank you
 





Re: Huge system.batches table after joining a node (Cassandra 3.11.1)

2017-12-07 Thread Alexander Dejanovski
Just a heads up that (in case you missed it) MVs were retroactively marked
as experimental and that a large part of the community considers they
should not be used in production.

On Thu, Dec 7, 2017 at 4:53 PM Alexander Dejanovski 
wrote:

> Yes, MVs use batches during bootstraps and decommissions.
>
> You can read more about it here :
> https://issues.apache.org/jira/browse/CASSANDRA-13065
> and here : https://issues.apache.org/jira/browse/CASSANDRA-13614
>
> Things will improve in 4.0 only it seems.
>
> On Thu, Dec 7, 2017 at 4:31 PM Christian Lorenz <
> christian.lor...@webtrekk.com> wrote:
>
>> Hi Alexander,
>>
>>
>>
>> yes we use MV’s. The size of the batch table is around 10GB on the
>> existing nodes. Also seems pretty high.
>>
>> So is this table (also) used to process MV building?
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>> *Von: *Alexander Dejanovski 
>> *Antworten an: *"user@cassandra.apache.org" 
>> *Datum: *Donnerstag, 7. Dezember 2017 um 16:24
>> *An: *"user@cassandra.apache.org" 
>> *Betreff: *Re: Huge system.batches table after joining a node (Cassandra
>> 3.11.1)
>>
>>
>>
>> Hi Christian,
>>
>>
>>
>> it is probably not safe to drop it because it contains all logged batches
>> that are supposed to be played on the cluster.
>>
>> The size of the batches table should go down as they get processed
>> (although 100GB is a pretty huge batch log...)
>>
>>
>>
>> Do you use Materialized Views in your data model ?
>>
>> You just bootstrapped a new node and the table grew on all other nodes ?
>>
>>
>>
>> On Thu, Dec 7, 2017 at 12:25 PM Christian Lorenz <
>> christian.lor...@webtrekk.com> wrote:
>>
>> Hi,
>>
>>
>>
>> after joining a node into an existing cluster, the table system.batches
>> became quite large (100GB) which is about 1/3 of the nodes size.
>>
>> Is it safe to truncate the table?
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> --
>>
>> -
>>
>> Alexander Dejanovski
>>
>> France
>>
>> @alexanderdeja
>>
>>
>>
>> Consultant
>>
>> Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Huge system.batches table after joining a node (Cassandra 3.11.1)

2017-12-07 Thread Alexander Dejanovski
Yes, MVs use batches during bootstraps and decommissions.

You can read more about it here :
https://issues.apache.org/jira/browse/CASSANDRA-13065
and here : https://issues.apache.org/jira/browse/CASSANDRA-13614

Things will improve in 4.0 only it seems.

On Thu, Dec 7, 2017 at 4:31 PM Christian Lorenz <
christian.lor...@webtrekk.com> wrote:

> Hi Alexander,
>
>
>
> yes we use MV’s. The size of the batch table is around 10GB on the
> existing nodes. Also seems pretty high.
>
> So is this table (also) used to process MV building?
>
>
>
> Regards,
>
> Christian
>
> *Von: *Alexander Dejanovski 
> *Antworten an: *"user@cassandra.apache.org" 
> *Datum: *Donnerstag, 7. Dezember 2017 um 16:24
> *An: *"user@cassandra.apache.org" 
> *Betreff: *Re: Huge system.batches table after joining a node (Cassandra
> 3.11.1)
>
>
>
> Hi Christian,
>
>
>
> it is probably not safe to drop it because it contains all logged batches
> that are supposed to be played on the cluster.
>
> The size of the batches table should go down as they get processed
> (although 100GB is a pretty huge batch log...)
>
>
>
> Do you use Materialized Views in your data model ?
>
> You just bootstrapped a new node and the table grew on all other nodes ?
>
>
>
> On Thu, Dec 7, 2017 at 12:25 PM Christian Lorenz <
> christian.lor...@webtrekk.com> wrote:
>
> Hi,
>
>
>
> after joining a node into an existing cluster, the table system.batches
> became quite large (100GB) which is about 1/3 of the nodes size.
>
> Is it safe to truncate the table?
>
>
>
> Regards,
>
> Christian
>
>
>
> --
>
> -
>
> Alexander Dejanovski
>
> France
>
> @alexanderdeja
>
>
>
> Consultant
>
> Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Huge system.batches table after joining a node (Cassandra 3.11.1)

2017-12-07 Thread Christian Lorenz
Hi Alexander,

yes we use MV’s. The size of the batch table is around 10GB on the existing 
nodes. Also seems pretty high.
So is this table (also) used to process MV building?

Regards,
Christian
Von: Alexander Dejanovski 
Antworten an: "user@cassandra.apache.org" 
Datum: Donnerstag, 7. Dezember 2017 um 16:24
An: "user@cassandra.apache.org" 
Betreff: Re: Huge system.batches table after joining a node (Cassandra 3.11.1)

Hi Christian,

it is probably not safe to drop it because it contains all logged batches that 
are supposed to be played on the cluster.
The size of the batches table should go down as they get processed (although 
100GB is a pretty huge batch log...)

Do you use Materialized Views in your data model ?
You just bootstrapped a new node and the table grew on all other nodes ?

On Thu, Dec 7, 2017 at 12:25 PM Christian Lorenz 
> wrote:
Hi,

after joining a node into an existing cluster, the table system.batches became 
quite large (100GB) which is about 1/3 of the nodes size.
Is it safe to truncate the table?

Regards,
Christian

--
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Huge system.batches table after joining a node (Cassandra 3.11.1)

2017-12-07 Thread Alexander Dejanovski
Hi Christian,

it is probably not safe to drop it because it contains all logged batches
that are supposed to be played on the cluster.
The size of the batches table should go down as they get processed
(although 100GB is a pretty huge batch log...)

Do you use Materialized Views in your data model ?
You just bootstrapped a new node and the table grew on all other nodes ?

On Thu, Dec 7, 2017 at 12:25 PM Christian Lorenz <
christian.lor...@webtrekk.com> wrote:

> Hi,
>
>
>
> after joining a node into an existing cluster, the table system.batches
> became quite large (100GB) which is about 1/3 of the nodes size.
>
> Is it safe to truncate the table?
>
>
>
> Regards,
>
> Christian
>
>
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: run Cassandra on physical machine

2017-12-07 Thread Oliver Ruebenacker
 Hello,

  Yes, you can.

 Best, Oliver

On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote:

> Dear All,
>
> Can we run Cassandra on physical machine directly?
> we all know that vm can reduce the performance.For instance,we have a
> machine with 56 core,8 ssd disks.
> Can we run 8 cassandra instance in the same machine within one rack with
> different port?
>
> Could anyone please advise?
>
> Thanks,
> Peng Xiao
>



-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
, Broad Institute



Re: run Cassandra on physical machine

2017-12-07 Thread Eric Evans
On Thu, Dec 7, 2017 at 6:12 AM, Peng Xiao <2535...@qq.com> wrote:
> Dear All,
>
> Can we run Cassandra on physical machine directly?
> we all know that vm can reduce the performance.For instance,we have a
> machine with 56 core,8 ssd disks.
> Can we run 8 cassandra instance in the same machine within one rack with
> different port?
>
> Could anyone please advise?

We do this.  It works, but it is not pretty.

Honestly, I would gladly accept the (relatively small) overhead of a
virtual machine (or better yet, use containers), than do this.

-- 
Eric Evans
eev...@wikimedia.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Node crashes on repair (Cassandra 3.11.1)

2017-12-07 Thread Christian Lorenz
I think we’ve hit the Bug described here:

https://issues.apache.org/jira/browse/CASSANDRA-14096

Regards,
Christian

Von: Christian Lorenz 
Antworten an: "user@cassandra.apache.org" 
Datum: Freitag, 1. Dezember 2017 um 10:04
An: "user@cassandra.apache.org" 
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

Hi Jeff,

the repairs worked fine before on version 3.9. I noticed that the validation 
tasks when doing a repair are not bound anymore to the concurrent_compactors 
value.
Is this maybe too much pressure for the node to manage, so it gets stressed too 
much?

Greetings,
Christian

Von: Jeff Jirsa 
Antworten an: "user@cassandra.apache.org" 
Datum: Donnerstag, 30. November 2017 um 19:46
An: cassandra 
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

That was worded poorly. The depth has a max depth of 20, the tree is the same 
size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa 
> wrote:
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be 
timing out elsewhere (or still running validation or something)
--
Jeff Jirsa


On Nov 30, 2017, at 10:12 AM, Javier Canillas 
> wrote:
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between 
nodes and that's why it times out. How many nodes do you have and what's the 
size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it), that's 
why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz 
>:
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a 
‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
at 
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_151]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[na:1.8.0_151]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian




run Cassandra on physical machine

2017-12-07 Thread Peng Xiao
Dear All,


Can we run Cassandra on physical machine directly?
we all know that vm can reduce the performance.For instance,we have a machine 
with 56 core,8 ssd disks.
Can we run 8 cassandra instance in the same machine within one rack with 
different port?


Could anyone please advise?


Thanks,
Peng Xiao

Huge system.batches table after joining a node (Cassandra 3.11.1)

2017-12-07 Thread Christian Lorenz
Hi,

after joining a node into an existing cluster, the table system.batches became 
quite large (100GB) which is about 1/3 of the nodes size.
Is it safe to truncate the table?

Regards,
Christian



Re: Connection refused - 127.0.0.1-Gossip

2017-12-07 Thread Marek Kadek -T (mkadek - CONSOL PARTNERS LTD at Cisco)
I tried setting it up with pod ip, but it did not help.

From: "ZAIDI, ASAD A" 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, December 6, 2017 at 8:45 PM
To: "user@cassandra.apache.org" 
Subject: RE: Connection refused - 127.0.0.1-Gossip

Throwing my $ .002

Rpc_address  defaults to localhosts that if not set, picks value from 
dns/hostname file. May be you can try setting rpc_address:  
- see if this helps!


From: Marek Kadek -T (mkadek - CONSOL PARTNERS LTD at Cisco) 
[mailto:mka...@cisco.com]
Sent: Wednesday, December 06, 2017 2:19 AM
To: user@cassandra.apache.org
Subject: Re: Connection refused - 127.0.0.1-Gossip

Thanks for any ideas/hints, any straw is worth checking at this point ☺

Well, the clusters “work”, data is correctly stored and queries. I’m interested 
in why it tries to open a gossip to localhost, and what kind of (performance) 
impact could this have on clusters.
The env vars are correctly passed, and cassandra yaml seems to be correctly 
set. We are using Cassandra docker image.

listen_address: 100.110.253.6 (correct pod ip)
# listen_interface: eth0
# listen_interface_prefer_ipv6: false
broadcast_rpc_address: 100.110.253.6

It’s also observable with minikube and single C* node on local machine.




From: Lerh Chuan Low >
Reply-To: "user@cassandra.apache.org" 
>
Date: Tuesday, December 5, 2017 at 11:14 PM
To: "user@cassandra.apache.org" 
>
Subject: Re: Connection refused - 127.0.0.1-Gossip

I think as Jeff mentioned it sounds like a configuration issue, are you sure 
you are using the same configmap/however it's being passed in and just throwing 
out ideas, maybe the pods are behind a http proxy and you may have forgotten to 
pass in the env vars?

On 6 December 2017 at 08:45, Jeff Jirsa 
> wrote:
I don't have any k8 clusters to test with, but do you know how your yaml 
translates to cassandra.yaml ? What are the listen/broadcast addresses being 
set?


On Tue, Dec 5, 2017 at 6:09 AM, Marek Kadek -T (mkadek - CONSOL PARTNERS LTD at 
Cisco) > wrote:

We are experiencing following issues with Cassandra on our kubernetes clusters:

```

@ kubectl exec -it cassandra-cassandra-0 -- tail /var/log/cassandra/debug.log

DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2017-12-05 
09:02:06,560 OutboundTcpConnection.java:545 - Unable to connect to 
localhost/127.0.0.1

java.net.ConnectException: Connection refused

at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_131]

at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_131]

at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_131]

at 
sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_131]

at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:433)
 [apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262)
 [apache-cassandra-3.11.0.jar:3.11.0]

```



Basically, it’s tons and tons of the same message over and over (on all 
clusters, all C* nodes). It tries roughly 4-5 times a second to open a tcp 
connection to localhost (?) for gossiping.



What we know:

- does not happen on Cassandra 3.0.15, but happen on 3.11.1 (same 
configuration).

- does happen even on minikube-single-Cassandra “cluster”.

- does not happen on