[ 
https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982561#action_12982561
 ] 

Ivo Ladage-van Doorn edited comment on CASSANDRA-1992 at 1/17/11 4:34 AM:
--------------------------------------------------------------------------

I have the exact same problem with an existing installation and was preparing 
to create an issue for it, but found this issue just before creating it. I'll 
describe the issue I have, maybe that provides some relevant information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an 
existing one-node cluster. The existing node contains already some data when 
the second node is added to the cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 
and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following 
settings in the cassandra.yaml and log4j-server.properties (I use the default 
values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories: /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the 
following keyspace, column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories: /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in 
total but finishes without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: 
getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) 
Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node 
/172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 
192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) 
InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 
248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: 
getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will 
be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: 
sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding 
thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using 
TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening 
for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely 
finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are 
messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to 
the cluster. The actual result differs each time I try, but always some rows 
have been moved to some other CF. The problem seems the same as the one 
described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also 
changing the replication factor from 1 to 2 most of the times 'resolves' the 
issue.

      was (Author: ivol):
    I have the exact same problem with an existing installation and was 
preparing to create an issue for it, but found this issue just before creating 
it. I'll describe the issue I have, maybe that provides some relevant 
information.

I ran into this issue with Cassandra 0.7 trying to add just one node to an 
existing one-node cluster. The existing node contains already some data when 
the second node is added to the cluster. This is what I did:

Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203 
and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following 
settings in the cassandra.yaml and log4j-server.properties (I use the default 
values for all other entries):

In cassandra.yaml:

initial_token: 0
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the first node and connect it using cassandra-cli. I add the 
following keyspace, column families and rows:

create keyspace Default;
use Default;

create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';

create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';

After this 'list Role' and 'list Gadget' return the proper rows.

Now I append a second node to the cluster, with this configuration:

In cassandra.yaml:

initial_token:
auto_bootstrap: true
data_file_directories:
    - /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206

In log4j-server.properties:

log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log


Now I start the second node. Bootstrapping takes some time, about 2 minutes in 
total but finishes without any warnings or errors:

...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining: 
getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366) 
Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node 
/172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line 
192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569) 
InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line 
248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining: 
getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will 
be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining: 
sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding 
thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using 
TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening 
for thrift clients...

Although everything seemed to worked just fine, when node 2 is completely 
finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are 
messed up;

list Role;

-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)

1 Row Returned.


list Gadget;

-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)

5 Rows Returned.

So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to 
the cluster. The actual result differs each time I try, but always some rows 
have been moved to some other CF. The problem seems the same as the one 
described by Mateusz.

I also found out that restarting the nodes seems to 'fix' the issue. Also 
changing the replication factor from 1 to 2 most of the times 'resolves' the 
issue.
  
> Bootstrap breaks data stored (missing rows, extra rows, column values 
> modified)
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1992
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64 
> Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
>            Reporter: Mateusz Korniak
>            Assignee: Brandon Williams
>             Fix For: 0.7.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1],  run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster 
> nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3 
> nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels, 
> not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to