Hi Bart, Thank for keeping the spirit up. An hour ago I spotted the same problem while testing. I should have a fix for that in a little bit. I’ll be in touch.
Kind regards, Kaveh. > On 28 Jun 2016, at 09:38, Bart DS <[email protected]> wrote: > > Ok, I did some tests and the agents are running fine now. > At least, initially... > > When I start my first server via the following command: > > ./build/bin/arangod --server.authentication=false --server.endpoint > tcp://0.0.0.0:8529 --cluster.my-address tcp://127.0.0.1:8529 > --cluster.my-local-info db1 --cluster.my-role PRIMARY > --cluster.agency-endpoint tcp://127.0.0.1:5001 primary1 > > I get following output: > > bart@laptop ~/test/arangodb $ ./build/bin/arangod > --server.authentication=false --server.endpoint tcp://0.0.0.0:8529 > --cluster.my-address tcp://127.0.0.1:8529 --cluster.my-local-info db1 > --cluster.my-role PRIMARY --cluster.agency-endpoint tcp://127.0.0.1:5001 > primary1 > 2016-06-28T07:06:38Z [24113] INFO using SSL options: > SSL_OP_CIPHER_SERVER_PREFERENCE, SSL_OP_TLS_ROLLBACK_BUG > 2016-06-28T07:06:38Z [24113] INFO file-descriptors (nofiles) hard limit is > 4096, soft limit is 1024 > 2016-06-28T07:06:38Z [24113] INFO created database directory 'primary1'. > 2016-06-28T07:06:38Z [24113] INFO WAL directory 'primary1/journals' does not > exist. creating it... > 2016-06-28T07:06:38Z [24113] INFO ArangoDB 3.0.x-devel [linux] 64bit, using > VPack 0.1.30, ICU 54.1, V8 5.0.71.39, OpenSSL 1.0.1f 6 Jan 2014 > 2016-06-28T07:06:38Z [24113] INFO loaded database '_system' from > 'primary1/databases/database-1' > 2016-06-28T07:06:38Z [24113] INFO the server has 4 (hyper) cores, using 1 > scheduler thread(s), 4 dispatcher thread(s) > 2016-06-28T07:06:39Z [24113] INFO JavaScript using startup './js', > application './js/apps' > 2016-06-28T07:06:39Z [24113] INFO changing state of PRIMARY server from > UNDEFINED to STARTUP > 2016-06-28T07:06:39Z [24113] INFO Cluster feature is turned on. Agency > version: , Agency endpoints: http+tcp://127.0.0.1:5002, > http+tcp://127.0.0.1:5002, http+tcp://127.0.0.1:5002, > http+tcp://127.0.0.1:5001, http+tcp://127.0.0.1:5001, > http+tcp://127.0.0.1:5001, server id: 'DBServer001', internal address: > tcp://127.0.0.1:8529, role: PRIMARY > 2016-06-28T07:06:39Z [24113] INFO using heartbeat interval value '1000 ms' > from agency > 2016-06-28T07:06:39Z [24113] INFO changing state of PRIMARY server from > STARTUP to SERVING > 2016-06-28T07:06:39Z [24113] INFO In database '_system': No version > information file found in database directory. > 2016-06-28T07:06:39Z [24113] INFO In database '_system': Database is > up-to-date (30000/cluster-local/init) > 2016-06-28T07:06:40Z [24113] INFO using endpoint 'http+tcp://0.0.0.0:8529' > for non-encrypted requests > 2016-06-28T07:06:40Z [24113] INFO Authentication is turned off > 2016-06-28T07:06:40Z [24113] INFO bootstraped DB server DBServer001 > 2016-06-28T07:06:40Z [24113] INFO bootstraped DB server DBServer001 > 2016-06-28T07:06:40Z [24113] INFO bootstraped DB server DBServer001 > 2016-06-28T07:06:40Z [24113] INFO In database '_system': Database is > up-to-date (-/db-server-local/init) > 2016-06-28T07:06:40Z [24113] INFO bootstraped DB server DBServer001 > 2016-06-28T07:06:40Z [24113] INFO ArangoDB (version 3.0.x-devel [linux]) is > ready for business. Have fun! > 2016-06-28T07:06:41Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:51Z [24113] ERROR error details: > {"code":307,"errorNum":0,"errorMessage":"Temporary Redirect (Temporary > Redirect)","error":true} > 2016-06-28T07:06:51Z [24113] ERROR error stack: ArangoError: Temporary > Redirect (Temporary Redirect)\n at Error (native)\n at writeLocked > (/home/bart/test/arangodb/js/server/modules/@arangodb/cluster.js:1663:41)\n > at createLocalDatabases > (/home/bart/test/arangodb/js/server/modules/@arangodb/cluster.js:348:9)\n > at handleDatabaseChanges > (/home/bart/test/arangodb/js/server/modules/@arangodb/cluster.js:461:3)\n > at handleChanges > (/home/bart/test/arangodb/js/server/modules/@arangodb/cluster.js:1460:3)\n > at handlePlanChange > (/home/bart/test/arangodb/js/server/modules/@arangodb/cluster.js:1675:24) > 2016-06-28T07:06:51Z [24113] ERROR plan change handling failed > 2016-06-28T07:06:52Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:53Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:54Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:55Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:56Z [24113] WARNING {heartbeat} heartbeat could not be sent > to agency endpoints (http+tcp://127.0.0.1:5002, http+tcp://127.0.0.1:5002, > http+tcp://127.0.0.1:5003, http+tcp://127.0.0.1:5003, > http+tcp://127.0.0.1:5003, http+tcp://127.0.0.1:5002, > http+tcp://127.0.0.1:5001, http+tcp://127.0.0.1:5001, > http+tcp://127.0.0.1:5001): http code: 307, body: > 2016-06-28T07:06:56Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:57Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:58Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:06:59Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:00Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:01Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:02Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:03Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:04Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:05Z [24113] WARNING {heartbeat} heartbeat could not be sent > to agency endpoints (http+tcp://127.0.0.1:5002, http+tcp://127.0.0.1:5003, > http+tcp://127.0.0.1:5003, http+tcp://127.0.0.1:5003, > http+tcp://127.0.0.1:5002, http+tcp://127.0.0.1:5002, > http+tcp://127.0.0.1:5001, http+tcp://127.0.0.1:5001, > http+tcp://127.0.0.1:5001): http code: 307, body: > 2016-06-28T07:07:05Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:06Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:07Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:09Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:10Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:13Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:13Z [24113] ERROR {heartbeat} Could not read Current/Version > from agency. > 2016-06-28T07:07:28Z [24113] INFO plan change handling successful > > > And the agents start using all cpu resources until the system becomes almost > unresponsive. > > At that time, the agents are logging following messages: > > 2016-06-28T07:06:52Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:06:52Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:06:54Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:06:57Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:06:58Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:06:58Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:02Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:02Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:10Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:13Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:25Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:30Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:30Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:33Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:07:43Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:08:24Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:08:34Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:08:54Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:08:55Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:09:14Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:09:25Z [23997] WARNING {agency} I have a higher term than RPC > caller. > > > When I stop the primary server again via ctrl-c, after some time the cpu > usage starts to decrease but the memory is increasing rapidly, finally taking > up all system memory and swap, in the end freezing the whole system. > At this point the agents start to log following messages: > > 2016-06-28T07:12:34Z [23478] WARNING {queries} slow query: 'FOR s in > @@collection FILTER s.time >= @start SORT s.time desc LIMIT 1 RETURN s', took > 26.345204 > 2016-06-28T07:12:43Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:13:05Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:13:10Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:13:20Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:13:23Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:13:39Z [23997] WARNING {agency} I have a higher term than RPC > caller. > 2016-06-28T07:15:23Z [23478] WARNING {queries} slow query: 'FOR s in > @@collection FILTER s.time >= @start SORT s.time desc LIMIT 1 RETURN s', took > 22.091859 > 2016-06-28T07:21:06Z [23478] WARNING {queries} slow query: 'FOR s in > @@collection FILTER s.time >= @start SORT s.time desc LIMIT 1 RETURN s', took > 17.283082 > 2016-06-28T07:22:26Z [23478] WARNING {queries} slow query: 'FOR s in > @@collection FILTER s.time >= @start SORT s.time desc LIMIT 1 RETURN s', took > 19.252867 > 2016-06-28T07:22:26Z [23478] WARNING {queries} slow query: 'FOR s in > @@collection FILTER s.time >= @start SORT s.time desc LIMIT 1 RETURN s', took > 40.744055 > [...] > > > So the initial issue with the agents seems to be resolved, but it's > definitely still not working correctly > > Bart > > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "ArangoDB" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/arangodb/p30FN_TXp30/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
signature.asc
Description: Message signed with OpenPGP using GPGMail
