[
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731998#comment-13731998
]
James Lee edited comment on CASSANDRA-5789 at 8/7/13 1:52 PM:
--------------------------------------------------------------
Repro script for the bug, run it as follows:
-- The script assumes you have two-node Cassandra cluster set up and running.
-- The system running the test should have Python (I used 2.7) with pycassa
installed.
-- Run the setup stage as follows: "python CassBugRepro.py -c ip1,ip2 -s -f".
This creates keyspaces and writes 2M rows into them.
-- Once the above has completed, wait until all hints have been delivered (I
checked using nodetool).
-- Then run the next stage which does random read/writes: "python
CassBugRepro.py -c ip1,ip2 -r".
-- If the bug has been repro'd, you'll see output like "NotFoundException for
DN 11055691"; where we haven't found something we'd previously sucessfully
written.
Note that repeating the above but omitting the "-f" parameter in the setup
stage will reduce the rate at which we initially populate the keys. I then see
no read failures.
was (Author: jameslee):
Repro script for the bug, run it as follows:
-- The script assumes you have two-node Cassandra cluster set up and running.
-- The system running the test should have Python (I used 2.7) with pycassa
installed.
-- Run the setup stage as follows: "python CassBugRepro.py -c ip1,ip2 -s -f".
This creates keyspaces and writes 2M rows into them.
-- Once the above has completed, wait until all hints have been delivered (I
checked using nodetool).
-- Then run the next stage which does random read/writes: "python
CassBugRepro.py -c ip1,ip2 -r".
-- If the bug has been repro'd, you'll see output like "NotFoundException for
DN 11055691"; where we haven't found something we'd previously sucessfully
written.
> Data not fully replicated with 2 nodes and replication factor 2
> ---------------------------------------------------------------
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL
> 6.2. I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
> Reporter: James Lee
> Attachments: CassBugRepro.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a
> connection pool pointed at both nodes. These are populated with writes using
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client
> with a connection pool pointed at both nodes. These are read using
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small
> proportion (~0.1%) are returned as Not Found. If I manually try to look up
> those keys using cassandra-cli, I see that they are returned when querying
> one of the nodes, but not when querying the other. So it seems like some of
> the rows have simply not been replicated, even though the write for these
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the
> database then I don't see any failed reads, so this seems like a load-related
> issue. My understanding is that if all writes were successful and there are
> no pending hinted handoffs, then the data should be fully-replicated and
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name Active Pending Completed Blocked All
> time blocked
> ReadStage 0 0 2 0
> 0
> RequestResponseStage 0 0 878494 0
> 0
> MutationStage 0 0 2869107 0
> 0
> ReadRepairStage 0 0 0 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 2208 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MigrationStage 0 0 994 0
> 0
> MemtablePostFlusher 0 0 4399 0
> 0
> FlushWriter 0 0 2264 0
> 556
> MiscStage 0 0 0 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> InternalResponseStage 0 0 153 0
> 0
> HintedHandoff 0 0 2 0
> 0
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> BINARY 0
> READ 0
> MUTATION 87655
> _TRACE 0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name Active Pending Completed Blocked All
> time blocked
> ReadStage 0 0 868 0
> 0
> RequestResponseStage 0 0 3919665 0
> 0
> MutationStage 0 0 8177325 0
> 0
> ReadRepairStage 0 0 113 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 9624 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MigrationStage 0 0 2666 0
> 0
> MemtablePostFlusher 0 0 7869 0
> 0
> FlushWriter 0 0 4273 0
> 1179
> MiscStage 0 0 0 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> InternalResponseStage 0 0 215 0
> 0
> HintedHandoff 0 0 8 0
> 0
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> BINARY 0
> READ 0
> MUTATION 531988
> _TRACE 0
> REQUEST_RESPONSE 0
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira