Re: oracle goldengate to cassandra

2017-10-10 Thread sam sriramadhesikan
Nesli,

What version of Golden Gate?

This might be achievable with Golden Gate Big Data (source) to Kafka to 
Cassandra (sink). This link 

 provides a good walkthrough except it uses ElasticSearch as sink.

Sam

> On Oct 10, 2017, at 4:24 AM, neslişah demirci  
> wrote:
> 
> Hi all , 
> 
> Is there anyone to know how can i configure oracle goldengate for data 
> streaming to cassandra database ?
> 
> Any comments , links, any help appreciated.
> 
> Kind regards ,
> Nesli. 



Re: DataStax Spark driver performance for analytics workload

2017-10-10 Thread Javier García-Valdecasas Bernal
Hi,

The spark-cassandra-connector does pushdown filter when there are valid
clauses. Pushdown filters go directly to cassandra so, if your model fits
your queries, you won't end up reading or scanning the full table but only
those partitions that match your query.

You can check which clauses are being pushed down when filtering a
dataframe using the df.filter("filter expression").explain() method
Check this url for more information:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md

I hope that this is of any help.

Javier García-Valdecasas Bernal

2017-10-10 15:11 GMT+02:00 Stone Fang :

> @kurt greaves
>
> doubt that need to read all the data.it is common that there are so many
> records in cassandra cluster.
> if loading all the data,how to analyse?
>
> On Mon, Oct 9, 2017 at 9:49 AM, kurt greaves  wrote:
>
>> spark-cassandra-connector will provide the best way to achieve what you
>> want, however under the hood it's still going to result in reading all the
>> data, and because of the way Cassandra works it will essentially read the
>> same SSTables multiple times from random points. You might be able to tune
>> to make this not super bad, but pretty much reading all the data is going
>> to have horrible implications for the cache if all your data doesn't fit in
>> memory regardless of what you do.​
>>
>
>


[RELEASE] Apache Cassandra 3.11.1 released

2017-10-10 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 3.11.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/QFBuPn
[2]: (NEWS.txt) https://goo.gl/vHd41x
[3]: https://issues.apache.org/jira/browse/CASSANDRA

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



[RELEASE] Apache Cassandra 3.0.15 released

2017-10-10 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 3.0.15.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/knZzCC
[2]: (NEWS.txt) https://goo.gl/HgTN9S
[3]: https://issues.apache.org/jira/browse/CASSANDRA

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Could not connect to localhost:9160 when installing Cassandra on AWS

2017-10-10 Thread Jon Haddad
How did you install Cassandra?  

Try passing the machine’s IP address to cqlsh, like “cqlsh 192.168.1.1"

> On Oct 10, 2017, at 10:43 AM, Lutaya Shafiq Holmes  
> wrote:
> 
> Hello Cassandra Gurus,
> 
> After I installed Cassandra on AWS- This error comes up when I try to
> Start CQLSH
> 
> Could not connect to localhost:9160
> 
> 
> What should I do ?
> 
> 
> -- 
> Lutaaya Shafiq
> Web: www.ronzag.com | i...@ronzag.com
> Mobile: +256702772721 | +256783564130
> Twitter: @lutayashafiq
> Skype: lutaya5
> Blog: lutayashafiq.com
> http://www.fourcornersalliancegroup.com/?a=shafiqholmes
> 
> "The most beautiful people we have known are those who have known defeat,
> known suffering, known struggle, known loss and have found their way out of
> the depths. These persons have an appreciation, a sensitivity and an
> understanding of life that fills them with compassion, gentleness and a
> deep loving concern. Beautiful people do not just happen." - *Elisabeth
> Kubler-Ross*
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Could not connect to localhost:9160 when installing Cassandra on AWS

2017-10-10 Thread Lutaya Shafiq Holmes
Hello Cassandra Gurus,

After I installed Cassandra on AWS- This error comes up when I try to
Start CQLSH

Could not connect to localhost:9160


What should I do ?


-- 
Lutaaya Shafiq
Web: www.ronzag.com | i...@ronzag.com
Mobile: +256702772721 | +256783564130
Twitter: @lutayashafiq
Skype: lutaya5
Blog: lutayashafiq.com
http://www.fourcornersalliancegroup.com/?a=shafiqholmes

"The most beautiful people we have known are those who have known defeat,
known suffering, known struggle, known loss and have found their way out of
the depths. These persons have an appreciation, a sensitivity and an
understanding of life that fills them with compassion, gentleness and a
deep loving concern. Beautiful people do not just happen." - *Elisabeth
Kubler-Ross*

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: LWT and non-LWT mixed

2017-10-10 Thread Anuj Wadehra
Hi Daniel,
What is the RF and CL for Delete?Are you using asynchronous writes?Are you 
firing both statements from same node sequentially?Are you firing these queries 
in a loop such that more than one delete and LWT is fired for same partition?
I think if you have the same client executing both statements sequentially in 
same thread i.e. one after another and delete is synchronous, it should work 
fine. LWT will be executed after Cassandra has written on Quorum of nodes and 
will see the data. Paxos of LWT shall only be initiated when delete completes. 
I think, LWT should not be mixed with normal write when you have such writes 
fired from multiple nodes/threads on the same partition.

ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Tue, 10 Oct 2017 at 14:10, Daniel Woo wrote:   The 
document explains you cannot mix 
themhttp://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlLtwtTransactions.html

But what happens under the hood if I do? e.g, 
DELETE INSERT ... IF NOT EXISTS
The coordinator has 4 steps to do the second statement (INSERT)1. 
prepare/promise a ballot2. read current row from replicas3. propose new value 
along with the ballot to replicas4. commit and wait for ack from replicas
My question is, once the row is DELETed, the next INSERT LWT should be able to 
see that row's tombstone in step 2, then successfully inserts the new value. 
But my tests shows that this often fails, does anybody know why? 
-- 
Thanks & Regards,
Daniel  


Re: Using materialized view or AllowFiltering which one is better ?

2017-10-10 Thread Avi Levi
Thank you Crisan.
Using SASI does seems better solution. Although it is officially

not supported in production, I think that this is the optimal solution in
this case

On Mon, Oct 9, 2017 at 11:01 PM, Valentina Crisan <
valentina.cri...@gmail.com> wrote:

> Not really, my suggested primary key is similar to the one you have in
> your proposed MV. The only difference is that in MV it is Cassandra that
> takes care of data synchronization,  with manual denormalization you would
> need to do it yourself. Example with MV: If you had username 'andreas1988'
> and last_seen "2017-09-11 23:58:23' in your base table and then this user
> access the service and last_seen is updated to "2017-10-09 23:58:23" in
> your base table - what will happen in the background is that MV Cassandra
> will delete in a batch operation from the partition "2017-09-11 23:58:23"
> the username "andreas1988" and add it to partition "2017-10-09 23:58:23".
> Only when this batch will finish Cassandra will update the base table.
> If you denormalize manually it will be you that will need to create
> batches operations and do this changes manually, making sure that you only
> save last value for last_seen in your table. You will obtain the same in
> the end only the operation effort will be bigger.
> I understand why MV would be good for your requirements, but I have seen
> from the discussions that MV is not recommended for production mainly due
> to the fact that is not possible to check if a view is out of sync with the
> base table. Check older discussions (one or two weeks ago) and see details
> there re MV usage in production.
>
> One other solution could be to work on your users table with a secondary
> index on last_seen field ( Cassandra 3.4 onwards,  SASI would allow
> operators like <, > and multiple columns indexing) - clearly better than
> allow filtering - but still the whole cluster would be contacted most of
> the times for your queries. Maybe combining Cassandra SASI with Spark data
> locality could solve this better. But first you could try with SASI and see
> the query performance.
>
> Valentina
>
>
> On Mon, Oct 9, 2017 at 7:56 PM, Avi Levi  wrote:
>
>> Thanks Crisan .
>> I understand what you're saying. But according to your suggestion I will
>> have a record for every entry while I am interested only on the last entry
>> . So the proposed solution is actually keeping much more data then needed .
>>
>> On Oct 9, 2017 8:40 PM, "Valentina Crisan" 
>> wrote:
>>
>> Allow filtering is almost never the answer, especially when you want to
>> do a full table scan ( there might be some cases where the query is limited
>> to a partition and allow filtering could be used). And you would like to
>> run this query every minute - thus extremely good performance is required.
>> Allow filtering basically brings locally in your coordinator the whole
>> table content and performs local filtering of the data before answering
>> your query. Performance wise is not recommended to use such an
>> implementation.
>>
>> For a query running every minute you need to address it in one partition
>> read (according to Cassandra data modeling rules) and that can be done with
>> denormalization ( manually or materialized views). As far as I know and
>> also from the discussions in this list MV should be used still with caution
>> in production environments. Thus, the best option in my opinion is manual
>> denormalization of data, building a table with partition key last_seen and
>> clustering key username and adding/updating data accordingly. Furthermore
>> last_seen I understand it's a value of any time/hour of day - you could
>> consider building partitions per day: partition key  = (last_seen, day),
>> primary key = ((last_seen,day),username)).
>>
>> Valentina
>>
>> On Mon, Oct 9, 2017 at 1:13 PM, Avi Levi  wrote:
>>
>>> Hi
>>>
>>> I have the following table:
>>>
>>> CREATE TABLE users (
>>> username text,
>>> last_seen bigint,
>>> PRIMARY KEY (username)
>>> );
>>>
>>> where* last_seen* is basically the writetime . Number of records in the
>>> table is aprox 10 million. Insert is pretty much straightforward insert
>>> into users (username, last_seen) VALUES ([username], now)
>>>
>>> I want to make some processing on users that were not seen for the past
>>> XXX (where xxx can be hours/days ... ) by query the last_seen column
>>> (this query runs every minute) e.g :
>>>
>>> select username from users where last_seen < (now - 1 day).
>>>
>>> I have two options as I see it:
>>>
>>>1. use materialized view :
>>>
>>> CREATE MATERIALIZED VIEW users_last_seen AS
>>> SELECT last_seen, username
>>> FROM users
>>> WHERE last_seen IS NOT NULL
>>> PRIMARY KEY (last_seen, username);
>>>
>>>
>>> and simply query:
>>>
>>> select username from users_last_seen where last_seen < (now - 1 day)
>>>
>>>1.
>>>
>>>

Re: DataStax Spark driver performance for analytics workload

2017-10-10 Thread Stone Fang
@kurt greaves

doubt that need to read all the data.it is common that there are so many
records in cassandra cluster.
if loading all the data,how to analyse?

On Mon, Oct 9, 2017 at 9:49 AM, kurt greaves  wrote:

> spark-cassandra-connector will provide the best way to achieve what you
> want, however under the hood it's still going to result in reading all the
> data, and because of the way Cassandra works it will essentially read the
> same SSTables multiple times from random points. You might be able to tune
> to make this not super bad, but pretty much reading all the data is going
> to have horrible implications for the cache if all your data doesn't fit in
> memory regardless of what you do.​
>


Re: LWT and non-LWT mixed

2017-10-10 Thread Javier Canillas
Daniel,

Cassandra is "eventually consistent". This means that the DELETE can go to
a different coordinator than the INSERT... IF NOT EXISTS. Being so, each
coordinator enters a race condition than can make the INSERT...IF NOT
EXISTS failed reading data that the DELETE will destroy. Even on the same
coordinator, each statement is treated on different threads.

You can play around with CONSISTENCY LEVEL, applying both statements with
ALL may reduce the chance of failure, but it won't make it go away.

What you are willing to do is to "lock" the row on delete, that's something
you can do on SQL engines but not on C*.

Regards,

2017-10-10 5:22 GMT-03:00 Daniel Woo :

> The document explains you cannot mix them
> http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/
> dmlLtwtTransactions.html
>
> But what happens under the hood if I do? e.g,
>
> DELETE 
> INSERT ... IF NOT EXISTS
>
> The coordinator has 4 steps to do the second statement (INSERT)
> 1. prepare/promise a ballot
> 2. read current row from replicas
> 3. propose new value along with the ballot to replicas
> 4. commit and wait for ack from replicas
>
> My question is, once the row is DELETed, the next INSERT LWT should be
> able to see that row's tombstone in step 2, then successfully inserts the
> new value. But my tests shows that this often fails, does anybody know why?
>
> --
> Thanks & Regards,
> Daniel
>


LWT and non-LWT mixed

2017-10-10 Thread Daniel Woo
The document explains you cannot mix them
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlLtwtTransactions.html

But what happens under the hood if I do? e.g,

DELETE 
INSERT ... IF NOT EXISTS

The coordinator has 4 steps to do the second statement (INSERT)
1. prepare/promise a ballot
2. read current row from replicas
3. propose new value along with the ballot to replicas
4. commit and wait for ack from replicas

My question is, once the row is DELETed, the next INSERT LWT should be able
to see that row's tombstone in step 2, then successfully inserts the new
value. But my tests shows that this often fails, does anybody know why?

-- 
Thanks & Regards,
Daniel


oracle goldengate to cassandra

2017-10-10 Thread neslişah demirci
Hi all ,

Is there anyone to know how can i configure oracle goldengate for data
streaming to cassandra database ?

Any comments , links, any help appreciated.

Kind regards ,
Nesli.


Re: Does NTP affects LWT's ballot UUID?

2017-10-10 Thread Daniel Woo
Hi DuyHai,

Thanks, and that's exactly what I am asking, if NTP goes backward. Actually
NTP often does that because clock drift is inevitable.

On Tue, Oct 10, 2017 at 3:13 PM, DuyHai Doan  wrote:

> The ballot UUID is obtained using QUORUM agreement between replicas for a
> given partition key and we use this TimeUUID ballot as write-time for the
> mutation.
>
> The only scenario where I can see a problem is that NTP goes backward in
> time on a QUORUM of replicas, which would break the contract of
> monotonicity. I don't know how likely this event is ...
>
> On Tue, Oct 10, 2017 at 9:07 AM, Daniel Woo 
> wrote:
>
>> Hi guys,
>>
>> The ballot UUID should be monotonically increasing on each coordinator,
>> but the UUID in cassandra is version 1 (timestamp based), what happens if
>> the NTP service adjusts system clock while a two phase paxos prepare/commit
>> is in progress?
>>
>> --
>> Thanks & Regards,
>> Daniel
>>
>
>


-- 
Thanks & Regards,
Daniel


Re: CREATE INDEX without IF NOT EXISTS when snapshoting

2017-10-10 Thread Lutaya Shafiq Holmes
TRY THIS CODE -

REPLACE WITH YOUR OWN KEYSPACE AND TABLE

cqlsh:tutorialspoint> describe table emp;
CREATE TABLE tutorialspoint.emp (
emp_id int PRIMARY KEY,
emp_city text,
emp_name text,
emp_phone varint,
emp_sal varint
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_emp_sal_idx ON tutorialspoint.emp (emp_sal);

On 10/10/17, ganesh.ga...@bt.com  wrote:
> Hi,
>
> Is there a way to estimate the size of table and index ?
> I know we can estimate the size once table and index are already created
> using nodetool cfstats, but I want to know before loading data into table.
> Could you please help if there is any such formula to find out.
>
> Thanks and Regards
> Ganesh
> From: Javier Canillas [mailto:javier.canil...@gmail.com]
> Sent: 05 October 2017 14:42
> To: user@cassandra.apache.org
> Cc: Cassandra DEV 
> Subject: Re: CREATE INDEX without IF NOT EXISTS when snapshoting
>
> Well,
>
> Patches submitted for version 3.0, 3.11 and trunk (although 3.0 patch can be
> applied to 3.11). Feel free to review and comment.
>
> Thanks!
>
> 2017-10-04 16:41 GMT-03:00 Javier Canillas
> >:
> Kurt,
>
> Thanks for your response. Created this
> ticket. Feel free to
> add anything to it that seems legit.
>
> Downloading Cassandra code right now.
>
> Fix seems quite simple. Expect a pull-request soon xD
>
> 2017-10-03 20:19 GMT-03:00 kurt greaves
> >:
> Certainly would make sense and should be trivial.
> here
> is where you want to look. Just create a ticket for it and prod here for a
> reviewer once you've got a change.​
>
>
>


-- 
Lutaaya Shafiq
Web: www.ronzag.com | i...@ronzag.com
Mobile: +256702772721 | +256783564130
Twitter: @lutayashafiq
Skype: lutaya5
Blog: lutayashafiq.com
http://www.fourcornersalliancegroup.com/?a=shafiqholmes

"The most beautiful people we have known are those who have known defeat,
known suffering, known struggle, known loss and have found their way out of
the depths. These persons have an appreciation, a sensitivity and an
understanding of life that fills them with compassion, gentleness and a
deep loving concern. Beautiful people do not just happen." - *Elisabeth
Kubler-Ross*

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Does NTP affects LWT's ballot UUID?

2017-10-10 Thread DuyHai Doan
The ballot UUID is obtained using QUORUM agreement between replicas for a
given partition key and we use this TimeUUID ballot as write-time for the
mutation.

The only scenario where I can see a problem is that NTP goes backward in
time on a QUORUM of replicas, which would break the contract of
monotonicity. I don't know how likely this event is ...

On Tue, Oct 10, 2017 at 9:07 AM, Daniel Woo  wrote:

> Hi guys,
>
> The ballot UUID should be monotonically increasing on each coordinator,
> but the UUID in cassandra is version 1 (timestamp based), what happens if
> the NTP service adjusts system clock while a two phase paxos prepare/commit
> is in progress?
>
> --
> Thanks & Regards,
> Daniel
>


Does NTP affects LWT's ballot UUID?

2017-10-10 Thread Daniel Woo
Hi guys,

The ballot UUID should be monotonically increasing on each coordinator, but
the UUID in cassandra is version 1 (timestamp based), what happens if the
NTP service adjusts system clock while a two phase paxos prepare/commit is
in progress?

-- 
Thanks & Regards,
Daniel