Re: Random slow read times in Cassandra

2017-03-17 Thread daemeon reiydelle
check for level 2 (stop the world) garbage collections.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101 <(219)%20798-7101>:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508 <(224)%20966-2508>:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948 <(217)%20700-4948>:9009:66
>
> Get Person time: 2 2042756682 <(204)%20275-6682>:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396 <(219)%20421-1396>:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689 <(226)%20484-9689>:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Re: Random slow read times in Cassandra

2017-03-17 Thread Jonathan Haddad
Probably Jvm pauses. Check your logs for long GC times.
On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948:9009:66
>
> Get Person time: 2 2042756682:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Random slow read times in Cassandra

2017-03-17 Thread Chuck Reynolds
I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has consistently 
random high read times.  In general most reads are under 10 milliseconds but 
with in the 30 request there is usually a read time that is a couple of seconds.

Instance type: r4.8xlarge
EBS GP2 volumes, 3tb with 9000 IOPS
30 Gig Heap

Data per node is about 170 gigs

The keyspace is an id & a blob.  When I check the data the slow reads don’t 
seem to have anything to do with size of the blobs

This system has repairs run once a weeks because it takes a lot of updates.

The client makes a call and does 30 request serially to Cassandra and the 
response times look like this in milliseconds.

What could make these so slow and what can I do to diagnosis this?


Responses

Get Person time: 3 319746229:9009:66
Get Person time: 7 1830093695:9009:66
Get Person time: 4 30072253:9009:66
Get Person time: 4 2303790089:9009:66
Get Person time: 2 156792066:9009:66
Get Person time: 8 491230624:9009:66
Get Person time: 7 284904599:9009:66
Get Person time: 4 600370489:9009:66
Get Person time: 2 281007386:9009:66
Get Person time: 4 971178094:9009:66
Get Person time: 1 1322259885:9009:66
Get Person time: 2 1937958542:9009:66
Get Person time: 9 286536648:9009:66
Get Person time: 9 1835633470:9009:66
Get Person time: 2 300867513:9009:66
Get Person time: 3 178975468:9009:66
Get Person time: 2900 293043081:9009:66
Get Person time: 8 214913830:9009:66
Get Person time: 2 1956710764:9009:66
Get Person time: 4 237673776:9009:66
Get Person time: 17 68942206:9009:66
Get Person time: 1800 20072145:9009:66
Get Person time: 2 304698506:9009:66
Get Person time: 2 308177320:9009:66
Get Person time: 2 998436038:9009:66
Get Person time: 10 1036890112:9009:66
Get Person time: 1 1629649548:9009:66
Get Person time: 6 1595339706:9009:66
Get Person time: 4 1079637599:9009:66
Get Person time: 3 556342855:9009:66


Get Person time: 5 1856382256:9009:66
Get Person time: 3 1891737174:9009:66
Get Person time: 2 1179373651:9009:66
Get Person time: 2 1482602756:9009:66
Get Person time: 3 1236458510:9009:66
Get Person time: 11 1003159823:9009:66
Get Person time: 2 1264952556:9009:66
Get Person time: 2 1662234295:9009:66
Get Person time: 1 246108569:9009:66
Get Person time: 5 1709881651:9009:66
Get Person time: 3213 11878078:9009:66
Get Person time: 2 112866483:9009:66
Get Person time: 2 201870153:9009:66
Get Person time: 6 227696684:9009:66
Get Person time: 2 1946780190:9009:66
Get Person time: 2 2197987101:9009:66
Get Person time: 18 1838959725:9009:66
Get Person time: 3 1782937802:9009:66
Get Person time: 3 1692530939:9009:66
Get Person time: 9 1765654196:9009:66
Get Person time: 2 1597757121:9009:66
Get Person time: 2 1853127153:9009:66
Get Person time: 3 1533599253:9009:66
Get Person time: 6 1693244112:9009:66
Get Person time: 6 82047537:9009:66
Get Person time: 2 96221961:9009:66
Get Person time: 4 98202209:9009:66
Get Person time: 9 12952388:9009:66
Get Person time: 2 300118652:9009:66
Get Person time: 10 78801084:9009:66


Get Person time: 13 1856424913:9009:66
Get Person time: 2 255814186:9009:66
Get Person time: 2 1183397424:9009:66
Get Person time: 5 1828603730:9009:66
Get Person time: 9 132965919:9009:66
Get Person time: 4 1616190071:9009:66
Get Person time: 2 15929337:9009:66
Get Person time: 10 297005427:9009:66
Get Person time: 2 1306460047:9009:66
Get Person time: 5 620139216:9009:66
Get Person time: 2 1364349058:9009:66
Get Person time: 3 629543403:9009:66
Get Person time: 5 1299827034:9009:66
Get Person time: 4 1593205912:9009:66
Get Person time: 2 1755460077:9009:66
Get Person time: 2 1906388666:9009:66
Get Person time: 1 1838653952:9009:66
Get Person time: 2 2249662508:9009:66
Get Person time: 3 1931708432:9009:66
Get Person time: 2 2177004948:9009:66
Get Person time: 2 2042756682:9009:66
Get Person time: 5 41764865:9009:66
Get Person time: 4023 1733384704:9009:66
Get Person time: 1 1614842189:9009:66
Get Person time: 2 2194211396:9009:66
Get Person time: 3 1711330834:9009:66
Get Person time: 2 2264849689:9009:66
Get Person time: 3 1819027970:9009:66
Get Person time: 2 1978614851:9009:66
Get Person time: 1 1863483129:9009:66