RE: list data value multiplied x2 in multi-datacenter environment
Thanks Duy Hai for these details. We you know whether the problems have been fixed or planned to be fixed? We are using C* 2.0.14. I didn't find any jira ticket concerning the issue. Regards, From: DuyHai Doan Sent: Wednesday, November 25, 2015 9:39:40 PM To: user@cassandra.apache.org Subject: Re: list data value multiplied x2 in multi-datacenter environment There was several bugs in the past related to list in CQL. Indeed the timestamp used for list columns are computed server side using a special algorithm. I wonder if in case of read-repair or/and hinted-handoff, would the original timestamp (the timestamp generated by the coordinator at the first insert/update) be used or the server will generate another one using its algorithm, it may explain the behavior. On Wed, Nov 25, 2015 at 9:36 PM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Our insert/select queries use CL = QUORUM. We don’t use BatchStatement to import data but executeAsync(Statement) with a fixed-size queue. Regards, From: Jack Krupansky [mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>] Sent: mercredi 25 novembre 2015 18:09 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: list data value multiplied x2 in multi-datacenter environment Be sire to include your actual insert statement. Also, what consistency level was used for the insert (all, quorum, local quorum, one, or...)? -- Jack Krupansky On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: No. We do not use update. All inserts are idempotent and there is no read-before-write query. On the corrupted data row, we have verified that the data only written once. Thanks for your answer! From: Laing, Michael [mailto:michael.la...@nytimes.com<mailto:michael.la...@nytimes.com>] Sent: mercredi 25 novembre 2015 15:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: list data value multiplied x2 in multi-datacenter environment You don't have any syntax in your application anywhere such as: UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...; Just a quick idempotency check :) On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky <jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote: Is the data corrupted exactly the same way on all three nodes and in both data centers, or just on one or two nodes or in only one data center? Are both columns doubled in the same row, or only one of them in a particular row? Does sound like a bug though, worthy of a Jira ticket. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Hello all, We encounter an issue on our Production environment that cannot be reproduced on Test environment: list (T = double or text) value is randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, b, c, a, b, c]). I know that it sounds weird but we just want to know whether it is a known issue (found nothing with Google…). We are working on a small dataset to narrow down issue with log data and maybe create a ticket in for DataStax Java Driver or Cassandra teams. Cassandra v2.0.14 DataStax Java Driver v2.1.7.1 OS RHEL6 Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC) UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3) The only difference between Prod and UAT cluster is the multi-datacenter mode on Prod one. We do not insert twice the same data on the same column of any specific row. All inserts/updates are idempotent! Data table: CREATE TABLE data ( field1 text, field2 int, field3 text, field4 double, field5 list, -- randomly having corrupted data, containing [1, 2, 3, 1, 2, 3] instead of [1, 2, 3] field6 text, field7 list, -- randomly having corrupted data, containing [a, b, c, a, b, c] instead of [a, b, c] PRIMARY KEY ((field1, field2), field3) ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }; Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or fa
RE: list data value multiplied x2 in multi-datacenter environment
No. We do not use update. All inserts are idempotent and there is no read-before-write query. On the corrupted data row, we have verified that the data only written once. Thanks for your answer! From: Laing, Michael [mailto:michael.la...@nytimes.com] Sent: mercredi 25 novembre 2015 15:39 To: user@cassandra.apache.org Subject: Re: list data value multiplied x2 in multi-datacenter environment You don't have any syntax in your application anywhere such as: UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...; Just a quick idempotency check :) On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky <jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote: Is the data corrupted exactly the same way on all three nodes and in both data centers, or just on one or two nodes or in only one data center? Are both columns doubled in the same row, or only one of them in a particular row? Does sound like a bug though, worthy of a Jira ticket. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Hello all, We encounter an issue on our Production environment that cannot be reproduced on Test environment: list (T = double or text) value is randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, b, c, a, b, c]). I know that it sounds weird but we just want to know whether it is a known issue (found nothing with Google…). We are working on a small dataset to narrow down issue with log data and maybe create a ticket in for DataStax Java Driver or Cassandra teams. Cassandra v2.0.14 DataStax Java Driver v2.1.7.1 OS RHEL6 Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC) UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3) The only difference between Prod and UAT cluster is the multi-datacenter mode on Prod one. We do not insert twice the same data on the same column of any specific row. All inserts/updates are idempotent! Data table: CREATE TABLE data ( field1 text, field2 int, field3 text, field4 double, field5 list, -- randomly having corrupted data, containing [1, 2, 3, 1, 2, 3] instead of [1, 2, 3] field6 text, field7 list, -- randomly having corrupted data, containing [a, b, c, a, b, c] instead of [a, b, c] PRIMARY KEY ((field1, field2), field3) ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }; Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le "message") sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: list data value multiplied x2 in multi-datacenter environment
Our insert/select queries use CL = QUORUM. We don’t use BatchStatement to import data but executeAsync(Statement) with a fixed-size queue. Regards, From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: mercredi 25 novembre 2015 18:09 To: user@cassandra.apache.org Subject: Re: list data value multiplied x2 in multi-datacenter environment Be sire to include your actual insert statement. Also, what consistency level was used for the insert (all, quorum, local quorum, one, or...)? -- Jack Krupansky On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: No. We do not use update. All inserts are idempotent and there is no read-before-write query. On the corrupted data row, we have verified that the data only written once. Thanks for your answer! From: Laing, Michael [mailto:michael.la...@nytimes.com<mailto:michael.la...@nytimes.com>] Sent: mercredi 25 novembre 2015 15:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: list data value multiplied x2 in multi-datacenter environment You don't have any syntax in your application anywhere such as: UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...; Just a quick idempotency check :) On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky <jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote: Is the data corrupted exactly the same way on all three nodes and in both data centers, or just on one or two nodes or in only one data center? Are both columns doubled in the same row, or only one of them in a particular row? Does sound like a bug though, worthy of a Jira ticket. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Hello all, We encounter an issue on our Production environment that cannot be reproduced on Test environment: list (T = double or text) value is randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, b, c, a, b, c]). I know that it sounds weird but we just want to know whether it is a known issue (found nothing with Google…). We are working on a small dataset to narrow down issue with log data and maybe create a ticket in for DataStax Java Driver or Cassandra teams. Cassandra v2.0.14 DataStax Java Driver v2.1.7.1 OS RHEL6 Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC) UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3) The only difference between Prod and UAT cluster is the multi-datacenter mode on Prod one. We do not insert twice the same data on the same column of any specific row. All inserts/updates are idempotent! Data table: CREATE TABLE data ( field1 text, field2 int, field3 text, field4 double, field5 list, -- randomly having corrupted data, containing [1, 2, 3, 1, 2, 3] instead of [1, 2, 3] field6 text, field7 list, -- randomly having corrupted data, containing [a, b, c, a, b, c] instead of [a, b, c] PRIMARY KEY ((field1, field2), field3) ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }; Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le "message") sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
list data value multiplied x2 in multi-datacenter environment
Hello all, We encounter an issue on our Production environment that cannot be reproduced on Test environment: list (T = double or text) value is randomly "multiplied" by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, b, c, a, b, c]). I know that it sounds weird but we just want to know whether it is a known issue (found nothing with Google...). We are working on a small dataset to narrow down issue with log data and maybe create a ticket in for DataStax Java Driver or Cassandra teams. Cassandra v2.0.14 DataStax Java Driver v2.1.7.1 OS RHEL6 Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC) UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3) The only difference between Prod and UAT cluster is the multi-datacenter mode on Prod one. We do not insert twice the same data on the same column of any specific row. All inserts/updates are idempotent! Data table: CREATE TABLE data ( field1 text, field2 int, field3 text, field4 double, field5 list, -- randomly having corrupted data, containing [1, 2, 3, 1, 2, 3] instead of [1, 2, 3] field6 text, field7 list, -- randomly having corrupted data, containing [a, b, c, a, b, c] instead of [a, b, c] PRIMARY KEY ((field1, field2), field3) ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }; Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le "message") sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: list data value multiplied x2 in multi-datacenter environment
Hello, The data are corrupted on all the 6 replicas (3 per datacenter). I used consistency level ONE and queried on all node -> same result. In our use-case, only 1 of the 4 data columns (field4, 5, 6, 7) contains the data, the 3 others contain NULL. We are trying to create a small dataset for Jira ticket. It is strange that nobody encounters the same issue. Minh From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: mercredi 25 novembre 2015 15:16 To: user@cassandra.apache.org Subject: Re: list data value multiplied x2 in multi-datacenter environment Is the data corrupted exactly the same way on all three nodes and in both data centers, or just on one or two nodes or in only one data center? Are both columns doubled in the same row, or only one of them in a particular row? Does sound like a bug though, worthy of a Jira ticket. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Hello all, We encounter an issue on our Production environment that cannot be reproduced on Test environment: list (T = double or text) value is randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, b, c, a, b, c]). I know that it sounds weird but we just want to know whether it is a known issue (found nothing with Google…). We are working on a small dataset to narrow down issue with log data and maybe create a ticket in for DataStax Java Driver or Cassandra teams. Cassandra v2.0.14 DataStax Java Driver v2.1.7.1 OS RHEL6 Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC) UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3) The only difference between Prod and UAT cluster is the multi-datacenter mode on Prod one. We do not insert twice the same data on the same column of any specific row. All inserts/updates are idempotent! Data table: CREATE TABLE data ( field1 text, field2 int, field3 text, field4 double, field5 list, -- randomly having corrupted data, containing [1, 2, 3, 1, 2, 3] instead of [1, 2, 3] field6 text, field7 list, -- randomly having corrupted data, containing [a, b, c, a, b, c] instead of [a, b, c] PRIMARY KEY ((field1, field2), field3) ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }; Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le "message") sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: CQL Data Model question
Hello, The problem with your approach is: you will need to specify all the 30 filters (in the pre-defined order in PK) when querying. I would go for this data model: CREATE TABLE t ( name text, filter_name1 text, filter_value1 text, filter_name2 text, filter_value2 text, filter_name3 text, filter_value3 text, -- since you only have up to 3 filters in one query PRIMARY KEY (name, f_n1, f_v1, f_n2, f_v2, f_n3, f_v3) ); And denormalize the data when you import to the table : One line in Oracle table with K filters become C(3, K) lines in C* table. Best regards, Minh From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com] Sent: lundi 11 mai 2015 20:32 To: user@cassandra.apache.org Subject: CQL Data Model question Hi, I am trying to port an Oracle Table to Cassandra. the table is a wide table (931 columns) and could have millions of rows. name, filter1, filter2filter30, data1, data2...data900 The user would retrieve multiple rows from this table and filter (30 filter columns) by one or more (up to 3) of the filter columns, it could be any of the filter columns. (select * from table1 where name = .. and filter1 = .. and filter5= .. ;) What is the best design for this in Cassandra/CQL? I tried the following: Create table tab1 ( name text, flt1 text, flt2 text, flt3 text, .. flt30 text, data text, PRIMARY KEY (name, flt1, flt2, flt3, . flt30) ); Is there any side effects of having 30 composite keys? Thanks This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.departm...@pdf.commailto:legal.departm...@pdf.com. This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
ODBC driver for C* 2.0.xx
Hello, We are trying to connect some BI tools (eg. Tableau) to our NoSQL database running C* v2.0.6. It seems that the latest Datastax's ODBC driver is not compatible with Cassandra v2.xx (but only with v1.xx, i.e. prior CQL3 era): http://www.datastax.com/download#dl-datastax-drivers http://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-apache-cassandra Could you please confirm that the latest version, still in beta, of Datastax's ODBC driver is incompatible with keyspaces created with CQL3? Is there any alternative connectors for BI tools (eg. Hive, Spark drivers...)? Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: ODBC driver for C* 2.0.xx
Hello Jens, Thanks for your quick answer. We got the same issue: connection OK, data browsing NOK. I found an ODBC driver compatible CQL3, just released out yesterday by a company named Simba. But it is a commercial product... Best regards, Minh -Original Message- From: Jens-U. Mozdzen [mailto:jmozd...@nde.ag] Sent: vendredi 16 janvier 2015 17:56 To: user@cassandra.apache.org Subject: Re: ODBC driver for C* 2.0.xx Hi Minh, Zitat von Ngoc Minh VO ngocminh...@bnpparibas.com: Hello, We are trying to connect some BI tools (eg. Tableau) to our NoSQL database running C* v2.0.6. It seems that the latest Datastax's ODBC driver is not compatible with Cassandra v2.xx (but only with v1.xx, i.e. prior CQL3 era): http://www.datastax.com/download#dl-datastax-drivers http://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-ap ache-cassandra Could you please confirm that the latest version, still in beta, of Datastax's ODBC driver is incompatible with keyspaces created with CQL3? Is there any alternative connectors for BI tools (eg. Hive, Spark drivers...)? while I'm far from being authoritative, my experience confirms the above: We could connect to the C* cluster, but would not see CQL3-created CFs. We were successful in testing an ODBC to JDBC bridge in conjunction with the Datastax JDBC driver. (Which is not to say we were happy with the results... ODBC clients tend to assume SQL DBMS at the other end ;) But we were able to retrieve data that way, to confirm the connection.) Regards, Jens This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: Cassandra backup via snapshots in production
Thanks a lot for your answers! What we plan to do is: - auto_snapshot = true - if the human errors happened on D-5: o we will bring the cluster offline o purge all data o import snapshots prior D-5 (and delete snapshots after D-5) o upload all missing data between D-5 and D o bring the cluster online Do you think it would work? From: Jens Rantil [mailto:jens.ran...@tink.se] Sent: mardi 25 novembre 2014 10:03 To: user@cassandra.apache.org Subject: Re: Cassandra backup via snapshots in production Truncate does trigger snapshot creation though Doesn’t it? With “auto_snapshot: true” it should. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.semailto:jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.sehttp://www.tink.se Facebook Linkedin Twitter On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.commailto:doanduy...@gmail.com wrote: True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com a écrit : On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.semailto:jens.ran...@tink.se wrote: The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). If that is the main purpose, having auto_snapshot: true” in cassandra.yaml will be enough to protect you. OP includes delete in their list of unexpected manipulations, and auto_snapshot: true will not protect you in any way from DELETE. =Rob http://twitter.com/rcolidba This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
Cassandra backup via snapshots in production
Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB HDD on each node for C* daily/weekly snapshots. - Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source. We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us. Your help would be greatly appreciated. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
RE: *Union* data type modeling in Cassandra
Thanks for your prompt answer. It works great! (simple and efficient!) Since I could not “bind” column name in prepared statement, I created 4 separate ones for each data type. It would be nice to have “INSERT INTO data_table(key, ?) VALUES (?, ?)” ☺ Regards, Minh From: doanduy...@gmail.com [mailto:doanduy...@gmail.com] Sent: vendredi 2 mai 2014 12:29 To: user@cassandra.apache.org Subject: Re: *Union* data type modeling in Cassandra Hello Ngoc Minh I'd go with the first data model. To solve the null - tombstone issue, just do not insert them at runtime if value is null. If only numvalue double != null - INSERT INTO data_table(key,numvalue) VALUES(...,...); If only numvalues listdouble != null - INSERT INTO data_table(key,numvalues) VALUES(...,...); and so on ... It means that you'll need to somehow perform null check in your code at runtime but it's the price to pay to avoid tombstones and avoid heavy compaction Regards Duy Hai DOAN On Fri, May 2, 2014 at 11:40 AM, Ngoc Minh VO ngocminh...@bnpparibas.commailto:ngocminh...@bnpparibas.com wrote: Hello all, I don’t know whether it is the right place to discuss about data modeling with Cassandra. We would like to have your feedbacks/recommendations on our schema modeling: 1. Our data are stored in a CF by their unique key (K) 2. Data type could be one of the following: Double, ListDouble, String, ListString 3. Hence we create a data table with: CREATE TABLE data_table ( key text, numvalue double, numvalues listdouble, strvalue text, strvalues listtext, PRIMARY KEY (key) ); 4. One and only one of the four columns contains a non-null value. The three others always contain null. 5. Pros: easy to debug This modeling works fine for us so far. But C* considers null values as tombstones and we start having tombstone overwhelming when the number reaches the threshold. We are planning to move to a simpler schema with only two columns: CREATE TABLE data_table ( key text, value blob, -- containing serialized data PRIMARY KEY (key) ); Pros: no null values, more efficient in term of storage? Cons: deserialization is handled on client side instead of in the Java driver (not sure which one is more efficient…) Could you please confirm that using “null” values in CF for non-expired “rows” is not a good practice? Thanks in advance for your help. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.
*Union* data type modeling in Cassandra
Hello all, I don't know whether it is the right place to discuss about data modeling with Cassandra. We would like to have your feedbacks/recommendations on our schema modeling: 1. Our data are stored in a CF by their unique key (K) 2. Data type could be one of the following: Double, ListDouble, String, ListString 3. Hence we create a data table with: CREATE TABLE data_table ( key text, numvalue double, numvalues listdouble, strvalue text, strvalues listtext, PRIMARY KEY (key) ); 4. One and only one of the four columns contains a non-null value. The three others always contain null. 5. Pros: easy to debug This modeling works fine for us so far. But C* considers null values as tombstones and we start having tombstone overwhelming when the number reaches the threshold. We are planning to move to a simpler schema with only two columns: CREATE TABLE data_table ( key text, value blob, -- containing serialized data PRIMARY KEY (key) ); Pros: no null values, more efficient in term of storage? Cons: deserialization is handled on client side instead of in the Java driver (not sure which one is more efficient...) Could you please confirm that using null values in CF for non-expired rows is not a good practice? Thanks in advance for your help. Best regards, Minh This message and any attachments (the message) is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. -- Ce message et toutes les pieces jointes (ci-apres le message) sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.