Since the datacentre move, the codehosting branch scanner has been intermittently failing. This manifests as an eternal "Updating branch..." on the website, which is often not noticed till a diff fails to appear in an associated merge proposal.
The failures in ackee/bzrsyncd/celeryd-job.log are along the lines of: [2012-08-22 16:39:26,958: INFO/MainProcess] Got task from broker: lp.services.job.celeryjob.CeleryRunJobIgnoreResult[BranchScanJob_14657367_c8b90ba9-db0a-4d2e-82b2-82413fd6b81e] [2012-08-22 16:39:27,012: INFO/PoolWorker-2] Running <SCAN_BRANCH branch job (4348709) for ~mandel/ubuntuone-client/use-new-fsevents-api> (ID 14657367) in status Waiting [2012-08-22 16:39:29,526: INFO/PoolWorker-2] Scanning branch: ~mandel/ubuntuone-client/use-new-fsevents-api [2012-08-22 16:39:29,526: INFO/PoolWorker-2] from lp-internal:///~mandel/ubuntuone-client/use-new-fsevents-api [2012-08-22 16:39:29,526: INFO/PoolWorker-2] Retrieving history from bzrlib. [2012-08-22 16:39:29,984: INFO/PoolWorker-2] Retrieving ancestry from database. [2012-08-22 16:39:30,533: INFO/PoolWorker-2] Planning changes. [2012-08-22 16:39:30,533: INFO/PoolWorker-2] Calculating history delta. [2012-08-22 16:39:30,540: INFO/PoolWorker-2] Adding 1 new revisions. [2012-08-22 16:39:31,699: INFO/PoolWorker-2] Job resulted in OOPS: OOPS-030ff1ea23f05521d4fd9800a66a2a3a [2012-08-22 16:39:31,700: INFO/MainProcess] Task lp.services.job.celeryjob.CeleryRunJobIgnoreResult[BranchScanJob_14657367_c8b90ba9-db0a-4d2e-82b2-82413fd6b81e] succeeded in 4.71384119987s: None Unfortunately the traceback in the oops is not useful, as it's cleanup fallout rather than the original error: Traceback (most recent call last): Module lazr.jobrunner.jobrunner, line 194, in runJobHandleError self.runJob(job, fallback) Module lp.services.job.runner, line 295, in runJob super(BaseJobRunner, self).runJob(IRunnableJob(job), fallback) Module lazr.jobrunner.jobrunner, line 162, in runJob job.run() Module lp.code.model.branchjob, line 331, in run bzrsync.syncBranchAndClose() Module contextlib, line 34, in __exit__ self.gen.throw(type, value, traceback) Module lp.services.database.locking, line 50, in try_advisory_lock store.execute(Select(AdvisoryUnlock(lock_type.value, lock_id))) Module storm.store, line 108, in execute return self._connection.execute(statement, params, noresult) Module storm.databases.postgres, line 266, in execute return Connection.execute(self, statement, params, noresult) Module storm.database, line 238, in execute raw_cursor = self.raw_execute(statement, params) Module storm.databases.postgres, line 276, in raw_execute return Connection.raw_execute(self, statement, params) Module storm.database, line 322, in raw_execute self._check_disconnect(raw_cursor.execute, *args) Module storm.database, line 371, in _check_disconnect return function(*args, **kwargs) InternalError: current transaction is aborted, commands ignored until end of transaction block The normal workaround for branch scanner problems is to use a trick to run it again, such as (thanks wgrant): $ bzr push -r-2 --overwrite $ bzr push But at least for some branches, the failures seem to be consistent, failing three times in a row. Apart from fixing the job to not mask the error, deploying that code, then seeing the actual problem, is there anything else we can try to resolve this? Martin _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp