[jira] [Commented] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

Brandon Williams (JIRA) Wed, 07 Aug 2013 14:20:11 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732756#comment-13732756
 ]


Brandon Williams commented on CASSANDRA-5789:
---------------------------------------------

I mocked out the test data.  It looks like the main problem here is your script 
doesn't know why it can't get the data, it can be any exception.  I made it 
print them out and got this:

{noformat}
AllServersUnavailable: An attempt was made to connect to each of the servers 
twice, but none of the attempts succeeded. The last failure was 
TTransportException: Could not connect to cassandra-3:9160
Process CassClientRunProcess-57:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "CassBugRepro.py", line 198, in run
    max_overflow = 1)
  File "/srv/pycassa/pycassa/pool.py", line 383, in __init__
    self.fill()
  File "/srv/pycassa/pycassa/pool.py", line 444, in fill
    conn = self._create_connection()
  File "/srv/pycassa/pycassa/pool.py", line 432, in _create_connection
    (exc.__class__.__name__, exc))
{noformat}

This is obviously bogus as I didn't shut anything down, but certainly doesn't 
indicate any missing data.  I'd recommend that you try to reproduce this with 
cassandra's stress tool and see if it finds any missing data.
                
> Data not fully replicated with 2 nodes and replication factor 2
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-5789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.2, 1.2.6
>         Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>            Reporter: James Lee
>            Assignee: Brandon Williams
>         Attachments: CassBugRepro.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> ReadStage                         0         0              2         0        
>          0
> RequestResponseStage              0         0         878494         0        
>          0
> MutationStage                     0         0        2869107         0        
>          0
> ReadRepairStage                   0         0              0         0        
>          0
> ReplicateOnWriteStage             0         0              0         0        
>          0
> GossipStage                       0         0           2208         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> MigrationStage                    0         0            994         0        
>          0
> MemtablePostFlusher               0         0           4399         0        
>          0
> FlushWriter                       0         0           2264         0        
>        556
> MiscStage                         0         0              0         0        
>          0
> commitlog_archiver                0         0              0         0        
>          0
> InternalResponseStage             0         0            153         0        
>          0
> HintedHandoff                     0         0              2         0        
>          0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                 87655
> _TRACE                       0
> REQUEST_RESPONSE             0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> ReadStage                         0         0            868         0        
>          0
> RequestResponseStage              0         0        3919665         0        
>          0
> MutationStage                     0         0        8177325         0        
>          0
> ReadRepairStage                   0         0            113         0        
>          0
> ReplicateOnWriteStage             0         0              0         0        
>          0
> GossipStage                       0         0           9624         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> MigrationStage                    0         0           2666         0        
>          0
> MemtablePostFlusher               0         0           7869         0        
>          0
> FlushWriter                       0         0           4273         0        
>       1179
> MiscStage                         0         0              0         0        
>          0
> commitlog_archiver                0         0              0         0        
>          0
> InternalResponseStage             0         0            215         0        
>          0
> HintedHandoff                     0         0              8         0        
>          0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                531988
> _TRACE                       0
> REQUEST_RESPONSE             0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

Reply via email to