[
https://issues.apache.org/jira/browse/DRILL-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461232#comment-16461232
]
Timothy Farkas commented on DRILL-6380:
---------------------------------------
Doing the following seems to fix the tests on jenkins.
1. Put replica sets into a tree map instead of a hashmap
2. Flapadoodle iterates over the entry set of the map. When we use a tree map
the config servers are guaranteed to be the first item flapadoodle iterates
over.
3. This guarantees that when flapadoodle starts the replica sets the config
servers are started first.
I suspect this works because the config servers have more time to properly
initialize and create necessary data like the lockping document if they are
initialized first. This is obviously really bad, but flapadoodle doesn't give
us a way to pause until a server is completely booted up, and even internally
flapadoodle does Thread.sleeps to wait for things to start up. We should
probably look into filing some issues with flapadoodle to clean these things up.
> Mongo db storage plugin tests can hang on jenkins.
> --------------------------------------------------
>
> Key: DRILL-6380
> URL: https://issues.apache.org/jira/browse/DRILL-6380
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Timothy Farkas
> Assignee: Timothy Farkas
> Priority: Major
>
> When running on our Jenkins server the mongodb tests hang because the Config
> servers take up to 5 seconds to process each request (see *Error 1*). This
> causes the tests to never finish within a reasonable span of time. Searching
> online people run into this issue when mixing versions of mongo db, but that
> is not happening in our tests. A possible cause is *Error 2* which seems to
> indicate that the mongo db config servers are not completely initialized
> since the config servers should have a lockping document when starting up.
> *Error 1*
> {code}
> [mongod output] 2018-05-01T23:38:47.468-0700 I COMMAND
> [replSetDistLockPinger] command config.lockpings command: findAndModify {
> findAndModify: "lockpings", query: { _id: "ConfigServer" }, update: { $set: {
> ping: new Date(1525243123413) } }, upsert: true, writeConcern: { w:
> "majority", wtimeout: 15000 } } planSummary: IDHACK update: { $set: { ping:
> new Date(1525243123413) } } keysExamined:0 docsExamined:0 nMatched:0
> nModified:0 upsert:1 keysInserted:2 numYields:0 reslen:198 locks:{ Global: {
> acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } },
> Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } },
> oplog: { acquireCount: { w: 1 } } } protocol:op_query 4055ms
> [mongod output] 2018-05-01T23:38:47.469-0700 W SHARDING
> [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused
> by :: LockStateChangeFailed: findAndModify query predicate didn't match any
> lock document
> [mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] lock
> 'balancer' successfully forced
> [mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer]
> distributed lock 'balancer' acquired, ts : 5ae95cd5d1023488104e6282
> [mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] CSRS
> balancer thread is recovering
> [mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] CSRS
> balancer thread is recovered
> [mongod output] 2018-05-01T23:38:48.056-0700 I NETWORK [thread2] connection
> accepted from 127.0.0.1:50244 #10 (7 connections now open)
> {code}
> *Error 2*
> {code}
> [mongod output] 2018-05-01T23:39:37.690-0700 I COMMAND [conn7] command
> config.settings command: find { find: "settings", filter: { _id: "chunksize"
> }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp
> 1525243172000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 } planSummary: EOF
> keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0
> reslen:354 locks:{ Global: { acquireCount: { r: 2 } }, Database: {
> acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } }
> protocol:op_command 4988ms
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)