RE: list data value multiplied x2 in multi-datacenter environment

2015-11-26 Thread Ngoc Minh VO
Thanks Duy Hai for these details.

We you know whether the problems have been fixed or planned to be fixed? We are 
using C* 2.0.14.

I didn't find any jira ticket concerning the issue.
Regards,



From: DuyHai Doan
Sent: Wednesday, November 25, 2015 9:39:40 PM
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

There was several bugs in the past related to list in CQL.

Indeed the timestamp used for list columns are computed server side using a 
special algorithm. I wonder if in case of read-repair or/and hinted-handoff, 
would the original timestamp (the timestamp generated by the coordinator at the 
first insert/update) be used or the server will generate another one using its 
algorithm, it may explain the behavior.



On Wed, Nov 25, 2015 at 9:36 PM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Our insert/select queries use CL = QUORUM.

We don’t use BatchStatement to import data but executeAsync(Statement) with a 
fixed-size queue.

Regards,

From: Jack Krupansky 
[mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>]
Sent: mercredi 25 novembre 2015 18:09

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: list data value multiplied x2 in multi-datacenter environment

Be sire to include your actual insert statement. Also, what consistency level 
was used for the insert (all, quorum, local quorum, one, or...)?


-- Jack Krupansky

On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
No. We do not use update.
All inserts are idempotent and there is no read-before-write query.

On the corrupted data row, we have verified that the data only written once.

Thanks for your answer!

From: Laing, Michael 
[mailto:michael.la...@nytimes.com<mailto:michael.la...@nytimes.com>]
Sent: mercredi 25 novembre 2015 15:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: list data value multiplied x2 in multi-datacenter environment

You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or fa

RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
No. We do not use update.
All inserts are idempotent and there is no read-before-write query.

On the corrupted data row, we have verified that the data only written once.

Thanks for your answer!

From: Laing, Michael [mailto:michael.la...@nytimes.com]
Sent: mercredi 25 novembre 2015 15:39
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.




RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Our insert/select queries use CL = QUORUM.

We don’t use BatchStatement to import data but executeAsync(Statement) with a 
fixed-size queue.

Regards,

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: mercredi 25 novembre 2015 18:09
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

Be sire to include your actual insert statement. Also, what consistency level 
was used for the insert (all, quorum, local quorum, one, or...)?


-- Jack Krupansky

On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
No. We do not use update.
All inserts are idempotent and there is no read-before-write query.

On the corrupted data row, we have verified that the data only written once.

Thanks for your answer!

From: Laing, Michael 
[mailto:michael.la...@nytimes.com<mailto:michael.la...@nytimes.com>]
Sent: mercredi 25 novembre 2015 15:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: list data value multiplied x2 in multi-datacenter environment

You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.





list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
"multiplied" by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google...). We are working on a small dataset to 
narrow down issue with log data and maybe create a ticket in for DataStax Java 
Driver or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh


This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Hello,

The data are corrupted on all the 6 replicas (3 per datacenter). I used 
consistency level ONE and queried on all node -> same result.

In our use-case, only 1 of the 4 data columns (field4, 5, 6, 7) contains the 
data, the 3 others contain NULL.

We are trying to create a small dataset for Jira ticket. It is strange that 
nobody encounters the same issue.
Minh

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: mercredi 25 novembre 2015 15:16
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.



RE: CQL Data Model question

2015-05-12 Thread Ngoc Minh VO
Hello,

The problem with your approach is: you will need to specify all the 30 filters 
(in the pre-defined order in PK) when querying.

I would go for this data model:
CREATE TABLE t (
name text,
filter_name1 text, filter_value1 text,
filter_name2 text, filter_value2 text,
filter_name3 text, filter_value3 text, -- since you only have up to 3 
filters in one query
PRIMARY KEY (name, f_n1, f_v1, f_n2, f_v2, f_n3, f_v3)
);

And denormalize the data when you import to the table :
One line in Oracle table with K filters become C(3, K) lines in C* table.

Best regards,
Minh

From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com]
Sent: lundi 11 mai 2015 20:32
To: user@cassandra.apache.org
Subject: CQL Data Model question

Hi,

I am trying to port an Oracle Table to Cassandra.
the table is a wide table (931 columns) and could have millions of rows.
 name, filter1, filter2filter30, data1, data2...data900

The user would retrieve multiple rows from this table and filter (30 filter 
columns) by one or more (up to 3) of the filter columns, it could be any of the 
filter columns.
(select * from table1 where name = .. and filter1 = .. and filter5= .. ;)

What is the best design for this in Cassandra/CQL?

I tried the following:
Create table tab1 (
name text,
flt1 text,
flt2 text,
flt3 text,
..
flt30 text,
data text,
PRIMARY KEY (name, flt1, flt2, flt3, . flt30) );

Is there any side effects of having 30 composite keys?

Thanks

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.commailto:legal.departm...@pdf.com.


This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


ODBC driver for C* 2.0.xx

2015-01-16 Thread Ngoc Minh VO
Hello,

We are trying to connect some BI tools (eg. Tableau) to our NoSQL database 
running C* v2.0.6.

It seems that the latest Datastax's ODBC driver is not compatible with 
Cassandra v2.xx (but only with v1.xx, i.e. prior CQL3 era):
http://www.datastax.com/download#dl-datastax-drivers
http://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-apache-cassandra

Could you please confirm that the latest version, still in beta, of Datastax's 
ODBC driver is incompatible with keyspaces created with CQL3? Is there any 
alternative connectors for BI tools (eg. Hive, Spark drivers...)?

Your help would be greatly appreciated.
Best regards,
Minh


This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


RE: ODBC driver for C* 2.0.xx

2015-01-16 Thread Ngoc Minh VO
Hello Jens,

Thanks for your quick answer. We got the same issue: connection OK, data 
browsing NOK.

I found an ODBC driver compatible CQL3, just released out yesterday by a 
company named Simba. But it is a commercial product...

Best regards,
Minh



-Original Message-
From: Jens-U. Mozdzen [mailto:jmozd...@nde.ag] 
Sent: vendredi 16 janvier 2015 17:56
To: user@cassandra.apache.org
Subject: Re: ODBC driver for C* 2.0.xx

Hi Minh,

Zitat von Ngoc Minh VO ngocminh...@bnpparibas.com:
 Hello,

 We are trying to connect some BI tools (eg. Tableau) to our NoSQL 
 database running C* v2.0.6.

 It seems that the latest Datastax's ODBC driver is not compatible with 
 Cassandra v2.xx (but only with v1.xx, i.e. prior CQL3 era):
 http://www.datastax.com/download#dl-datastax-drivers
 http://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-ap
 ache-cassandra

 Could you please confirm that the latest version, still in beta, of 
 Datastax's ODBC driver is incompatible with keyspaces created with 
 CQL3? Is there any alternative connectors for BI tools (eg. Hive, 
 Spark drivers...)?

while I'm far from being authoritative, my experience confirms the
above: We could connect to the C* cluster, but would not see CQL3-created CFs.

We were successful in testing an ODBC to JDBC bridge in conjunction with the 
Datastax JDBC driver. (Which is not to say we were happy with the results... 
ODBC clients tend to assume SQL DBMS at the other end
;) But we were able to retrieve data that way, to confirm the
connection.)

Regards,
Jens



This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


RE: Cassandra backup via snapshots in production

2014-11-27 Thread Ngoc Minh VO
Thanks a lot for your answers!

What we plan to do is:

-  auto_snapshot = true

-  if the human errors happened on D-5:

o   we will bring the cluster offline

o   purge all data

o   import snapshots prior D-5 (and delete snapshots after D-5)

o   upload all missing data between D-5 and D

o   bring the cluster online

Do you think it would work?

From: Jens Rantil [mailto:jens.ran...@tink.se]
Sent: mardi 25 novembre 2014 10:03
To: user@cassandra.apache.org
Subject: Re: Cassandra backup via snapshots in production

 Truncate does trigger snapshot creation though

Doesn’t it? With “auto_snapshot: true” it should.

——— Jens Rantil Backend engineer Tink AB Email: 
jens.ran...@tink.semailto:jens.ran...@tink.se Phone: +46 708 84 18 32 Web: 
www.tink.sehttp://www.tink.se Facebook Linkedin Twitter


On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:

True

Delete in CQL just create tombstone so from the storage engine pov it's just 
adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com a écrit :
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil 
jens.ran...@tink.semailto:jens.ran...@tink.se wrote:
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).

If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.

OP includes delete in their list of unexpected manipulations, and 
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba



This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


Cassandra backup via snapshots in production

2014-11-18 Thread Ngoc Minh VO
Hello all,

We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
The main purpose is to protect us from human errors (eg. unexpected 
manipulations: delete, drop tables, …).

We are thinking of:

-  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.

-  Restore: load the most recent snapshots or latest “non-corrupted” 
ones and replay missing data imports from other data source.

We would like to know if somebody are using Cassandra’s backup feature in 
production and could share your experience with us.

Your help would be greatly appreciated.
Best regards,
Minh


This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


RE: *Union* data type modeling in Cassandra

2014-05-05 Thread Ngoc Minh VO
Thanks for your prompt answer. It works great! (simple and efficient!)

Since I could not “bind” column name in prepared statement, I created 4 
separate ones for each data type.

It would be nice to have “INSERT INTO data_table(key, ?) VALUES (?, ?)” ☺

Regards,
Minh

From: doanduy...@gmail.com [mailto:doanduy...@gmail.com]
Sent: vendredi 2 mai 2014 12:29
To: user@cassandra.apache.org
Subject: Re: *Union* data type modeling in Cassandra

Hello Ngoc Minh

 I'd go with the first data model. To solve the null - tombstone issue, just 
do not insert them at runtime if value is null.

 If only numvalue double != null - INSERT INTO data_table(key,numvalue) 
VALUES(...,...);
 If only numvalues listdouble != null - INSERT INTO 
data_table(key,numvalues) VALUES(...,...);
and so on ...

 It means that you'll need to somehow perform null check in your code at 
runtime but it's the price to pay to avoid tombstones and avoid heavy compaction

Regards

 Duy Hai DOAN

On Fri, May 2, 2014 at 11:40 AM, Ngoc Minh VO 
ngocminh...@bnpparibas.commailto:ngocminh...@bnpparibas.com wrote:
Hello all,

I don’t know whether it is the right place to discuss about data modeling with 
Cassandra.

We would like to have your feedbacks/recommendations on our schema modeling:

1.   Our data are stored in a CF by their unique key (K)

2.   Data type could be one of the following: Double, ListDouble, String, 
ListString

3.   Hence we create a data table with:

CREATE TABLE data_table (

 key text,



 numvalue double,

 numvalues listdouble,

 strvalue text,

 strvalues listtext,



 PRIMARY KEY (key)

);

4.   One and only one of the four columns contains a non-null value. The 
three others always contain null.

5.   Pros: easy to debug

This modeling works fine for us so far. But C* considers null values as 
tombstones and we start having tombstone overwhelming when the number reaches 
the threshold.

We are planning to move to a simpler schema with only two columns:

CREATE TABLE data_table (

 key text,

 value blob, -- containing serialized data

 PRIMARY KEY (key)

);
Pros: no null values, more efficient in term of storage?
Cons: deserialization is handled on client side instead of in the Java driver 
(not sure which one is more efficient…)

Could you please confirm that using “null” values in CF for non-expired “rows” 
is not a good practice?

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the message) is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message)
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.



*Union* data type modeling in Cassandra

2014-05-02 Thread Ngoc Minh VO
Hello all,

I don't know whether it is the right place to discuss about data modeling with 
Cassandra.

We would like to have your feedbacks/recommendations on our schema modeling:

1.   Our data are stored in a CF by their unique key (K)

2.   Data type could be one of the following: Double, ListDouble, String, 
ListString

3.   Hence we create a data table with:

CREATE TABLE data_table (

 key text,



 numvalue double,

 numvalues listdouble,

 strvalue text,

 strvalues listtext,



 PRIMARY KEY (key)

);

4.   One and only one of the four columns contains a non-null value. The 
three others always contain null.

5.   Pros: easy to debug

This modeling works fine for us so far. But C* considers null values as 
tombstones and we start having tombstone overwhelming when the number reaches 
the threshold.

We are planning to move to a simpler schema with only two columns:

CREATE TABLE data_table (

 key text,

 value blob, -- containing serialized data

 PRIMARY KEY (key)

);
Pros: no null values, more efficient in term of storage?
Cons: deserialization is handled on client side instead of in the Java driver 
(not sure which one is more efficient...)

Could you please confirm that using null values in CF for non-expired rows 
is not a good practice?

Thanks in advance for your help.
Best regards,
Minh


This message and any attachments (the message) is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le message) 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.