Re: Data lost in Cassandra 3.5 single instance via Erlang driver
As a side note, if you're inserting records quickly enough that you're potentially doing multiple in the same millisecond, it seems likely to me that your partition size is going to be too large at a day level unless your writes are super bursty: ((appkey, pub_date), pub_timestamp). You might need to do hour, or 15 minutes or something, depending on what you think your peak write rate will look like. And another note, slightly bikeshed, but *personally* when doing time-based bucketing (pub_date column), I prefer to use a timestamp and floor the value I write. This makes it easier to convert to a smaller bucket size without changing the format of the data in that column. On Wed, Jun 15, 2016 at 1:07 AM linbo liaowrote: > Thanks Ben, Paul, Alain. I debug at client side find the reason is > pub_timestamp duplicated. I will use timeuuid instead. > > Thanks, > Linbo > > 2016-06-15 13:09 GMT+08:00 Alain Rastoul : > >> On 15/06/2016 06:40, linbo liao wrote: >> >>> I am not sure, but looks it will cause the update other than insert. If >>> it is true, the only way is request includes IF NOT EXISTS, inform the >>> client it failed? >>> >>> Thanks, >>> Linbo >>> >>> Hi Linbo, >> >> +1 with what Ben said, timestamp has a millisecond precision and is a bad >> choice for making PK unicity. >> If your client and server are on the same physical machine (both on same >> computer or different vms on same hypervisor), insert duration can go down >> to very few microseconds (2~3 on a recent computer). >> Your insert will/should often become "update". >> The reason is that update does not exists in cassandra, neither delete, >> they are just "appends": append with same key for update or append of a >> tombstone for delete. >> You should try to use a timeuuid instead, it has a node, clock sequence, >> a counter plus the timestamp part that you can get with cql functions, and >> it exists for that use. >> see here for the functions >> >> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html >> >> >> -- >> best, >> Alain >> > >
Re: Data lost in Cassandra 3.5 single instance via Erlang driver
Thanks Ben, Paul, Alain. I debug at client side find the reason is pub_timestamp duplicated. I will use timeuuid instead. Thanks, Linbo 2016-06-15 13:09 GMT+08:00 Alain Rastoul: > On 15/06/2016 06:40, linbo liao wrote: > >> I am not sure, but looks it will cause the update other than insert. If >> it is true, the only way is request includes IF NOT EXISTS, inform the >> client it failed? >> >> Thanks, >> Linbo >> >> Hi Linbo, > > +1 with what Ben said, timestamp has a millisecond precision and is a bad > choice for making PK unicity. > If your client and server are on the same physical machine (both on same > computer or different vms on same hypervisor), insert duration can go down > to very few microseconds (2~3 on a recent computer). > Your insert will/should often become "update". > The reason is that update does not exists in cassandra, neither delete, > they are just "appends": append with same key for update or append of a > tombstone for delete. > You should try to use a timeuuid instead, it has a node, clock sequence, a > counter plus the timestamp part that you can get with cql functions, and it > exists for that use. > see here for the functions > > https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html > > > -- > best, > Alain >
Re: Data lost in Cassandra 3.5 single instance via Erlang driver
On 15/06/2016 06:40, linbo liao wrote: I am not sure, but looks it will cause the update other than insert. If it is true, the only way is request includes IF NOT EXISTS, inform the client it failed? Thanks, Linbo Hi Linbo, +1 with what Ben said, timestamp has a millisecond precision and is a bad choice for making PK unicity. If your client and server are on the same physical machine (both on same computer or different vms on same hypervisor), insert duration can go down to very few microseconds (2~3 on a recent computer). Your insert will/should often become "update". The reason is that update does not exists in cassandra, neither delete, they are just "appends": append with same key for update or append of a tombstone for delete. You should try to use a timeuuid instead, it has a node, clock sequence, a counter plus the timestamp part that you can get with cql functions, and it exists for that use. see here for the functions https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html -- best, Alain
Re: Data lost in Cassandra 3.5 single instance via Erlang driver
If pub_timestamp could possibly match I'd suggest making it a timeuuid type instead. With the above schema it's not a failure or data loss if the timestamp is duplicated - your writes all probably made it - the duplicates just got overwritten. On Tue, Jun 14, 2016 at 9:40 PM, linbo liaowrote: > I am not sure, but looks it will cause the update other than insert. If it > is true, the only way is request includes IF NOT EXISTS, inform the > client it failed? > > Thanks, > Linbo > > 2016-06-15 10:59 GMT+08:00 Ben Slater : > >> Is it possible that your pub_timestamp values are colliding (which would >> result in an update rather than an insert)? >> >> On Wed, 15 Jun 2016 at 12:55 linbo liao wrote: >> >>> Hi, >>> >>> I use Erlang driver to send data to Cassandra, do testing at local >>> environment meet data lost issue. I have no idea what step is wrong. >>> >>> *Environment:* >>> >>> 1. Ubuntu 12.04 LTS x64bit >>> 2. Cassandra 3.5 single instance, not a cluster, installed via the >>> offical installation document, and didn't change any configuration except >>> enable the authenticator and authorizer. >>> 3. Cassandra binary protocol v4 >>> 3. Latest Erlang driver https://github.com/matehat/cqerl >>> 4. Erlang OTP 18.3 >>> >>> *Schema:* >>> >>> > DESCRIBE TABLE message.history CREATE TABLE message.history ( appkey text, pub_date text, pub_timestamp timestamp, apns text, message blob, message_id text, pub_method smallint, qos smallint, recv_type smallint, topic text, PRIMARY KEY ((appkey, pub_date), pub_timestamp) ) WITH CLUSTERING ORDER BY (pub_timestamp ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4', 'unchecked_tombstone_compaction': 'true'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 3600 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; >>> >>> >>> *Issue:* >>> >>> The client send 6 insert request to Server (ttl is 1000s), check the >>> TCP package every thing works fine, but data will miss random at Server >>> side(select insert data was missing). >>> >>> >>> > select * from history; appkey | pub_date | pub_timestamp | apns | message | message_id | pub_method | qos | recv_type | topic --++-+--+ --+--++-+---+- 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.70+ | | 0x68656c6c6f20746f20616c6961732066726f6d207 075626c697368325f746f5f616c696173 | 589020307122032641 | 2 | 1 | 1 | alias_mqttc_sub 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.817000+ | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} | 0x7b22612 23a2066726f6d207075626c697368327d | 11833652203486491113 | 2 | 1 | 0 | t2thi 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.818000+ | | 0x66726f6d20707974686f6e | 589020307579211776 | 2 | 1 | 0 | testtopic2 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.89+ | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} | 0x66726f6d207075626c69736832 | 589020307814092800 | 2 | 1 | 0 | testtopic2 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:25.024000+ | | 0x68656c6c6f20746f20616c696173 | 589020307818287105 | 2 | 1 | 1 |mytestalias1 >>> >>> >>> *TCP package inserted succeed flow:* >>> >>> 17:16:24.818210 IP localhost.38918 > localhost.9042:
Re: Data lost in Cassandra 3.5 single instance via Erlang driver
I am not sure, but looks it will cause the update other than insert. If it is true, the only way is request includes IF NOT EXISTS, inform the client it failed? Thanks, Linbo 2016-06-15 10:59 GMT+08:00 Ben Slater: > Is it possible that your pub_timestamp values are colliding (which would > result in an update rather than an insert)? > > On Wed, 15 Jun 2016 at 12:55 linbo liao wrote: > >> Hi, >> >> I use Erlang driver to send data to Cassandra, do testing at local >> environment meet data lost issue. I have no idea what step is wrong. >> >> *Environment:* >> >> 1. Ubuntu 12.04 LTS x64bit >> 2. Cassandra 3.5 single instance, not a cluster, installed via the >> offical installation document, and didn't change any configuration except >> enable the authenticator and authorizer. >> 3. Cassandra binary protocol v4 >> 3. Latest Erlang driver https://github.com/matehat/cqerl >> 4. Erlang OTP 18.3 >> >> *Schema:* >> >> > DESCRIBE TABLE message.history >>> >>> CREATE TABLE message.history ( >>> appkey text, >>> pub_date text, >>> pub_timestamp timestamp, >>> apns text, >>> message blob, >>> message_id text, >>> pub_method smallint, >>> qos smallint, >>> recv_type smallint, >>> topic text, >>> PRIMARY KEY ((appkey, pub_date), pub_timestamp) >>> ) WITH CLUSTERING ORDER BY (pub_timestamp ASC) >>> AND bloom_filter_fp_chance = 0.01 >>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >>> AND comment = '' >>> AND compaction = {'class': >>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>> 'max_threshold': '32', 'min_threshold': '4', >>> 'unchecked_tombstone_compaction': 'true'} >>> AND compression = {'chunk_length_in_kb': '64', 'class': >>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>> AND crc_check_chance = 1.0 >>> AND dclocal_read_repair_chance = 0.1 >>> AND default_time_to_live = 0 >>> AND gc_grace_seconds = 3600 >>> AND max_index_interval = 2048 >>> AND memtable_flush_period_in_ms = 0 >>> AND min_index_interval = 128 >>> AND read_repair_chance = 0.0 >>> AND speculative_retry = '99PERCENTILE'; >>> >> >> >> *Issue:* >> >> The client send 6 insert request to Server (ttl is 1000s), check the TCP >> package every thing works fine, but data will miss random at Server >> side(select insert data was missing). >> >> >> > select * from history; >>> >>> appkey | pub_date | pub_timestamp >>> | apns | message >>> | message_id | pub_method | >>> qos | recv_type | topic >>> >>> --++-+--+ >>> >>> --+--++-+---+- >>> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.70+ >>> | | >>> 0x68656c6c6f20746f20616c6961732066726f6d207 >>> 075626c697368325f746f5f616c696173 | 589020307122032641 | 2 >>> | 1 | 1 | alias_mqttc_sub >>> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.817000+ >>> | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} >>> | 0x7b22612 >>> 23a2066726f6d207075626c697368327d | 11833652203486491113 | 2 >>> | 1 | 0 | t2thi >>> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.818000+ >>> | | >>> 0x66726f6d20707974686f6e | 589020307579211776 | 2 >>> | 1 | 0 | testtopic2 >>> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.89+ >>> | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} | >>> 0x66726f6d207075626c69736832 | 589020307814092800 | 2 >>> | 1 | 0 | testtopic2 >>> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:25.024000+ >>> | | >>>0x68656c6c6f20746f20616c696173 | 589020307818287105 | 2 >>> | 1 | 1 |mytestalias1 >>> >> >> >> *TCP package inserted succeed flow:* >> >> 17:16:24.818210 IP localhost.38918 > localhost.9042: Flags [P.], seq >>> 1953472577:1953472814, ack 1963420469, win 530, options [nop,nop,TS val >>> 72982868 ecr 72940042], length 237 >>> 0x: 4500 0121 e5b1 4000 4006 5623 7f00 0001 E..!..@.@.V# >>> 0x0010: 7f00 0001 9806 2352 746f a041 7507 6b35 ..#Rto.Au.k5 >>> 0x0020: 8018 0212 ff15 0101 080a 0459 a154 .Y.T >>> 0x0030: 0458 fa0a *0400 0a00 e400 1005* .X.. >>> 0x0040: 6e06 c1fc 222c 813f 6228 61c5 7364
Re: Data lost in Cassandra 3.5 single instance via Erlang driver
Is it possible that your pub_timestamp values are colliding (which would result in an update rather than an insert)? On Wed, 15 Jun 2016 at 12:55 linbo liaowrote: > Hi, > > I use Erlang driver to send data to Cassandra, do testing at local > environment meet data lost issue. I have no idea what step is wrong. > > *Environment:* > > 1. Ubuntu 12.04 LTS x64bit > 2. Cassandra 3.5 single instance, not a cluster, installed via the offical > installation document, and didn't change any configuration except enable > the authenticator and authorizer. > 3. Cassandra binary protocol v4 > 3. Latest Erlang driver https://github.com/matehat/cqerl > 4. Erlang OTP 18.3 > > *Schema:* > > > DESCRIBE TABLE message.history >> >> CREATE TABLE message.history ( >> appkey text, >> pub_date text, >> pub_timestamp timestamp, >> apns text, >> message blob, >> message_id text, >> pub_method smallint, >> qos smallint, >> recv_type smallint, >> topic text, >> PRIMARY KEY ((appkey, pub_date), pub_timestamp) >> ) WITH CLUSTERING ORDER BY (pub_timestamp ASC) >> AND bloom_filter_fp_chance = 0.01 >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >> AND comment = '' >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >> 'max_threshold': '32', 'min_threshold': '4', >> 'unchecked_tombstone_compaction': 'true'} >> AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> AND crc_check_chance = 1.0 >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 3600 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99PERCENTILE'; >> > > > *Issue:* > > The client send 6 insert request to Server (ttl is 1000s), check the TCP > package every thing works fine, but data will miss random at Server > side(select insert data was missing). > > > > select * from history; >> >> appkey | pub_date | pub_timestamp >> | apns | message >> | message_id | pub_method | >> qos | recv_type | topic >> >> --++-+--+ >> >> --+--++-+---+- >> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.70+ >> | | >> 0x68656c6c6f20746f20616c6961732066726f6d207 >> 075626c697368325f746f5f616c696173 | 589020307122032641 | 2 | >> 1 | 1 | alias_mqttc_sub >> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.817000+ >> | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} >> | 0x7b22612 >> 23a2066726f6d207075626c697368327d | 11833652203486491113 | 2 | >> 1 | 0 | t2thi >> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.818000+ >> | | >> 0x66726f6d20707974686f6e | 589020307579211776 | 2 | >> 1 | 0 | testtopic2 >> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.89+ >> | {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} | >> 0x66726f6d207075626c69736832 | 589020307814092800 | 2 | >> 1 | 0 | testtopic2 >> 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:25.024000+ >> | | >>0x68656c6c6f20746f20616c696173 | 589020307818287105 | 2 | >> 1 | 1 |mytestalias1 >> > > > *TCP package inserted succeed flow:* > > 17:16:24.818210 IP localhost.38918 > localhost.9042: Flags [P.], seq >> 1953472577:1953472814, ack 1963420469, win 530, options [nop,nop,TS val >> 72982868 ecr 72940042], length 237 >> 0x: 4500 0121 e5b1 4000 4006 5623 7f00 0001 E..!..@.@.V# >> 0x0010: 7f00 0001 9806 2352 746f a041 7507 6b35 ..#Rto.Au.k5 >> 0x0020: 8018 0212 ff15 0101 080a 0459 a154 .Y.T >> 0x0030: 0458 fa0a *0400 0a00 e400 1005* .X.. >> 0x0040: 6e06 c1fc 222c 813f 6228 61c5 7364 6500 n...",.?b(a.sde. >> 0x0050: 0105 000b 0018 3532 6663 6330 3463 52fcc04c >> 0x0060: 3464 6339 3033 6436 3664 3666 3866 3932 4dc903d66d6f8f92 >> 0x0070: 000a 3230 3136 2d30 362d 3133 2016-06-13.. >> 0x0080: 0008 0155 490c 3571 0002 0001 .UI.5q.. >> 0x0090: 0002 0002 0014 3131 3833
Data lost in Cassandra 3.5 single instance via Erlang driver
Hi, I use Erlang driver to send data to Cassandra, do testing at local environment meet data lost issue. I have no idea what step is wrong. *Environment:* 1. Ubuntu 12.04 LTS x64bit 2. Cassandra 3.5 single instance, not a cluster, installed via the offical installation document, and didn't change any configuration except enable the authenticator and authorizer. 3. Cassandra binary protocol v4 3. Latest Erlang driver https://github.com/matehat/cqerl 4. Erlang OTP 18.3 *Schema:* > DESCRIBE TABLE message.history > > CREATE TABLE message.history ( > appkey text, > pub_date text, > pub_timestamp timestamp, > apns text, > message blob, > message_id text, > pub_method smallint, > qos smallint, > recv_type smallint, > topic text, > PRIMARY KEY ((appkey, pub_date), pub_timestamp) > ) WITH CLUSTERING ORDER BY (pub_timestamp ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4', > 'unchecked_tombstone_compaction': 'true'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 3600 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > *Issue:* The client send 6 insert request to Server (ttl is 1000s), check the TCP package every thing works fine, but data will miss random at Server side(select insert data was missing). > select * from history; > > appkey | pub_date | pub_timestamp | > apns | message > | message_id | pub_method | > qos | recv_type | topic > > --++-+--+ > > --+--++-+---+- > 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.70+ > | | > 0x68656c6c6f20746f20616c6961732066726f6d207 > 075626c697368325f746f5f616c696173 | 589020307122032641 | 2 | > 1 | 1 | alias_mqttc_sub > 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.817000+ | > {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} > | 0x7b22612 > 23a2066726f6d207075626c697368327d | 11833652203486491113 | 2 | > 1 | 0 | t2thi > 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.818000+ > | | > 0x66726f6d20707974686f6e | 589020307579211776 | 2 | > 1 | 0 | testtopic2 > 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:24.89+ | > {"aps":{"sound":"bingbong.aiff","badge":3,"alert":"douban"}} | > 0x66726f6d207075626c69736832 | 589020307814092800 | 2 | > 1 | 0 | testtopic2 > 52fcc04c4dc903d66d6f8f92 | 2016-06-13 | 2016-06-13 09:16:25.024000+ > | | >0x68656c6c6f20746f20616c696173 | 589020307818287105 | 2 | > 1 | 1 |mytestalias1 > *TCP package inserted succeed flow:* 17:16:24.818210 IP localhost.38918 > localhost.9042: Flags [P.], seq > 1953472577:1953472814, ack 1963420469, win 530, options [nop,nop,TS val > 72982868 ecr 72940042], length 237 > 0x: 4500 0121 e5b1 4000 4006 5623 7f00 0001 E..!..@.@.V# > 0x0010: 7f00 0001 9806 2352 746f a041 7507 6b35 ..#Rto.Au.k5 > 0x0020: 8018 0212 ff15 0101 080a 0459 a154 .Y.T > 0x0030: 0458 fa0a *0400 0a00 e400 1005* .X.. > 0x0040: 6e06 c1fc 222c 813f 6228 61c5 7364 6500 n...",.?b(a.sde. > 0x0050: 0105 000b 0018 3532 6663 6330 3463 52fcc04c > 0x0060: 3464 6339 3033 6436 3664 3666 3866 3932 4dc903d66d6f8f92 > 0x0070: 000a 3230 3136 2d30 362d 3133 2016-06-13.. > 0x0080: 0008 0155 490c 3571 0002 0001 .UI.5q.. > 0x0090: 0002 0002 0014 3131 3833 3336 ..118336 > 0x00a0: 3532 3230 3334 3836 3439 3131 3133 52203486491113.. > 0x00b0: 0014 7b22 6122 3a20 6672 6f6d 2070 7562 ..{"a":.from.pub > 0x00c0: 6c69 7368 327d 003c 7b22 6170 7322 lish2}...<{"aps" > 0x00d0: 3a7b 2273 6f75 6e64 223a 2262 696e 6762