[
https://issues.apache.org/jira/browse/CASSANDRA-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982561#action_12982561
]
Ivo Ladage-van Doorn commented on CASSANDRA-1992:
-------------------------------------------------
I have the exact same problem with an existing installation and was preparing
to create an issue for it, but found this issue just before creating it. I'll
describe the issue I have, maybe that provides some relevant information.
I ran into this issue with Cassandra 0.7 trying to add just one node to an
existing one-node cluster. The existing node contains already some data when
the second node is added to the cluster. This is what I did:
Setup
I have two nodes both running on Linux; a server called 'veers' on 172.16.2.203
and a 'r2d2' on 172.16.2.206. I use Cassandra 0.7 and only change the following
settings in the cassandra.yaml and log4j-server.properties (I use the default
values for all other entries):
In cassandra.yaml:
initial_token: 0
data_file_directories:
- /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.203
rpc_address: 172.16.2.203
In log4j-server.properties:
log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log
Now I start the first node and connect it using cassandra-cli. I add the
following keyspace, column families and rows:
create keyspace Default;
use Default;
create column family Role;
set Role['user_1']['name'] = 'User 1';
set Role['user_2']['name'] = 'User 2';
set Role['user_3']['name'] = 'User 3';
create column family Gadget;
set Gadget['gadget_1']['name'] = 'Gadget 1';
set Gadget['gadget_2']['name'] = 'Gadget 2';
set Gadget['gadget_3']['name'] = 'Gadget 3';
After this 'list Role' and 'list Gadget' return the proper rows.
Now I append a second node to the cluster, with this configuration:
In cassandra.yaml:
initial_token:
auto_bootstrap: true
data_file_directories:
- /vol/users/ivol/cassandra_work/data
commitlog_directory: /vol/users/ivol/cassandra_work/commitlog
saved_caches_directory: /vol/users/ivol/cassandra_work/saved_caches
seeds: 172.16.2.203
listen_address: 172.16.2.206
rpc_address: 172.16.2.206
In log4j-server.properties:
log4j.appender.R.File=/vol/users/ivol/cassandra_work/system.log
Now I start the second node. Bootstrapping takes some time, about 2 minutes in
total but finishes without any warnings or errors:
...
INFO [main] 2011-01-17 09:58:09,170 StorageService.java (line 399) Joining:
getting load information
INFO [main] 2011-01-17 09:58:09,171 StorageLoadBalancer.java (line 366)
Sleeping 90000 ms to wait for load information...
INFO [GossipStage:1] 2011-01-17 09:58:10,447 Gossiper.java (line 577) Node
/172.16.2.203 is now part of the cluster
INFO [HintedHandoff:1] 2011-01-17 09:58:11,451 HintedHandOffManager.java (line
192) Started hinted handoff for endpoint /172.16.2.203
INFO [GossipStage:1] 2011-01-17 09:58:11,451 Gossiper.java (line 569)
InetAddress /172.16.2.203 is now UP
INFO [HintedHandoff:1] 2011-01-17 09:58:11,453 HintedHandOffManager.java (line
248) Finished hinted handoff of 0 rows to endpoint /172.16.2.203
INFO [main] 2011-01-17 09:59:39,189 StorageService.java (line 399) Joining:
getting bootstrap token
INFO [main] 2011-01-17 09:59:39,203 BootStrapper.java (line 148) New token will
be 110533280274756817580689726417060138498 to assume load from /172.16.2.203
INFO [main] 2011-01-17 09:59:39,265 StorageService.java (line 399) Joining:
sleeping 30000 ms for pending range setup
INFO [main] 2011-01-17 10:00:09,272 StorageService.java (line 399) Bootstrapping
INFO [main] 2011-01-17 10:00:09,663 CassandraDaemon.java (line 77) Binding
thrift service to /172.16.2.206:9160
INFO [main] 2011-01-17 10:00:09,666 CassandraDaemon.java (line 91) Using
TFramedTransport with a max frame size of 15728640 bytes.
INFO [main] 2011-01-17 10:00:09,671 CassandraDaemon.java (line 119) Listening
for thrift clients...
Although everything seemed to worked just fine, when node 2 is completely
finished bootstrapping the rows in the 'Role' and 'Gadget' Column Families are
messed up;
list Role;
-------------------
RowKey: user_3
=> (column=6e616d65, value=557365722033, timestamp=1295254678545000)
1 Row Returned.
list Gadget;
-------------------
RowKey: user_2
=> (column=6e616d65, value=557365722032, timestamp=1295254678514000)
-------------------
RowKey: gadget_2
=> (column=6e616d65, value=4761646765742032, timestamp=1295254678805000)
-------------------
RowKey: gadget_3
=> (column=6e616d65, value=4761646765742033, timestamp=1295254679429000)
-------------------
RowKey: gadget_1
=> (column=6e616d65, value=4761646765742031, timestamp=1295254678771000)
-------------------
RowKey: user_1
=> (column=6e616d65, value=557365722031, timestamp=1295254678449000)
5 Rows Returned.
So 2 rows have been moved from CF 'Role' to 'Gadget', just by adding a node to
the cluster. The actual result differs each time I try, but always some rows
have been moved to some other CF. The problem seems the same as the one
described by Mateusz.
I also found out that restarting the nodes seems to 'fix' the issue. Also
changing the replication factor from 1 to 2 most of the times 'resolves' the
issue.
> Bootstrap breaks data stored (missing rows, extra rows, column values
> modified)
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-1992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1992
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.0
> Environment: Linux 2.6.36-1 #1 SMP Tue Nov 9 09:56:02 CET 2010 x86_64
> Intel(R)_Core(TM)2_Quad_CPU____Q8300__@_2.50GHz PLD Linux
> glibc-2.12-4.i686
> java-sun-1.6.0.22-1.i686
> Reporter: Mateusz Korniak
> Assignee: Brandon Williams
> Fix For: 0.7.1
>
> Original Estimate: 8h
> Remaining Estimate: 8h
>
> Scenario:
> Two fresh (empty /data /commitog /saved_caches dirs) cassandra installs.
> Start first one.
> Run data inserting program [1], run again in verify mode - all data intact.
> Bootstrap 2nd node.
> Run verification again, now it fails.
> Issue is very strange to me as cassandra works perfectly for me when cluster
> nodes stay the same for days now but any bootstrap ( 1 -> 2 nodes, 2 -> 3
> nodes, 2->3 nodes RF=2) breaks data.
> I am running cassandra with 1GB heap size, 32bit userland on 64bit kernels,
> not sure what else could matter there.
> Any hints ?
> Thanks in advance, regards.
> [1] simple program generating data and later verifying data.
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/test.py
> [2] Logs from 1st node:
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.4.log
> [3] Logs from 2nd (bootstraping node)
> http://beauty.ant.gliwice.pl/bugs/cassandra-bootstrap/system-3.8.log
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.