> On Feb 26, 2015, at 2:49 PM, Richard Smith <[email protected]> wrote: > > On Thu, Feb 26, 2015 at 8:30 AM, Ben Langmuir <[email protected] > <mailto:[email protected]>> wrote: > >> On Feb 25, 2015, at 6:20 PM, Richard Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Tue, Feb 24, 2015 at 7:55 PM, Ben Langmuir <[email protected] >> <mailto:[email protected]>> wrote: >> >>> On Feb 24, 2015, at 6:57 PM, Richard Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> On Mon, Feb 9, 2015 at 12:35 PM, Ben Langmuir <[email protected] >>> <mailto:[email protected]>> wrote: >>> Author: benlangmuir >>> Date: Mon Feb 9 14:35:13 2015 >>> New Revision: 228604 >>> >>> URL: http://llvm.org/viewvc/llvm-project?rev=228604&view=rev >>> <http://llvm.org/viewvc/llvm-project?rev=228604&view=rev> >>> Log: >>> Diagnose timeouts in the LockFileManager and delete the dead lock file >>> >>> If the lock file manager times out, we should give an error rather than >>> silently trying to load the existing module. And delete the >>> (presumably) dead lock file, since it will otherwise prevent progress in >>> future invokations. This is unsound since we have no way to prove that >>> the lock file we are deleting is the same one we timed out on, but since >>> the lock is only to avoid excessive rebuilding anyway it should be okay. >>> Depends on llvm r228603. >>> >>> This diagnostic fires a *lot* when, say, bootstrapping Clang with modules >>> enabled at -j16. Is the timeout perhaps too short? >> >> I lowered the timeout to 1 minute in my llvm commit, >> >> Well, technically, it went from 8m 44s to 65.5s due to the exponential >> backoff... :-/ > > Heh, good point. > >> >> so if it helps, feel free to bump it back up. I’m not particularly >> concerned about the actual number now that we will actually honour the >> limit, but the old value of 5 minutes seemed really long to me. >> >> Really? Consider a Debug+Asserts clang, building a large module (such as the >> Clang_AST module). Requiring that to succeed in 65.5 seconds doesn't seem >> particularly reasonable (and, indeed, some such modules do not build in that >> environment in 65.5s on my machine, at least when it's heavily loaded). > > That sucks. > >> >> But I'm not really sure that a timeout is the right approach here at all. >> What we want to track is whether the creating process is still alive. >> Perhaps we should have that process periodically bump the timestamp on the >> lock file, and assume it's died if the timestamp is too far in the past? >> This'd mean that in the (scary) case of sharing a module cache across a >> build farm, you'd need your machines to not have too much clock skew, but >> that's already necessary for build systems that use timestamps to determine >> what to rebuild, so it doesn't seem like a ridiculous restriction. >> >> Thoughts? > > I think what you’re proposing is a better way to track whether the creator is > alive than what we currently do (looking at the hostname+pid). We could > eliminate the clock-sync dependence if the waiting process set up its own > local timer and then reset it if it sees the timestamp of the lock file has > changed (ie. we don’t care what the value is, only whether it changes). But > your suggestion is simpler and automatically handles a new process waiting on > a long-dead file in a good way. > > At some level I like that the timeout gives you feedback when a compilation > takes much longer than expected. How about this: in addition to tracking > process liveness, we also use a timeout. When the timeout is reached, we > emit a warning that the module build is taking a long time. We then forcibly > delete the lock file and try again to acquire the lock ourselves. This has > the added advantage of combatting priority inversion. If we reach the > timeout again (maybe several times?), we emit an error and fail. What do you > think? > > For the most part, this sounds great to me. I'm not entirely convinced that > it makes sense to remove the lock and retry if we can see that another > process is periodically touching it.
Okay, but right now any time we delete a lock file we may accidentally delete a valid lock because of races. I haven’t found a good way to avoid the window between checking a lock file is invalid and deleting that file (which is done by name). > I think issuing a remark after some timeout (1 minute, maybe?), and possibly > failing after a longer timeout, may be reasonable. I was thinking a warning, but I guess -Werror makes that a bad idea. Failing instead of clearing the lock file is okay with me. > > We'd need a mechanism to turn off the second timeout (for cases where modules > simply take a long time to build for whatever reason) and possibly to turn > off the first timeout (to ensure we provide deterministic output). -Rmodule-timeout-soft -Wmodule-timeout (DefaultError) We could theoretically make a -W group that included both for convenience, although that might be weird with a remark... Another question: where do you think we should touch the file? In a consumer of top-level decls? > >>> Modified: >>> cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>> cfe/trunk/lib/Frontend/CompilerInstance.cpp >>> >>> Modified: cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff >>> >>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff> >>> ============================================================================== >>> --- cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td (original) >>> +++ cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td Mon Feb 9 >>> 14:35:13 2015 >>> @@ -83,6 +83,8 @@ def err_module_not_found : Error<"module >>> def err_module_not_built : Error<"could not build module '%0'">, >>> DefaultFatal; >>> def err_module_lock_failure : Error< >>> "could not acquire lock file for module '%0'">, DefaultFatal; >>> +def err_module_lock_timeout : Error< >>> + "timed out waiting to acquire lock file for module '%0'">, DefaultFatal; >>> def err_module_cycle : Error<"cyclic dependency in module '%0': %1">, >>> DefaultFatal; >>> def note_pragma_entered_here : Note<"#pragma entered here">; >>> >>> Modified: cfe/trunk/lib/Frontend/CompilerInstance.cpp >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff >>> >>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff> >>> ============================================================================== >>> --- cfe/trunk/lib/Frontend/CompilerInstance.cpp (original) >>> +++ cfe/trunk/lib/Frontend/CompilerInstance.cpp Mon Feb 9 14:35:13 2015 >>> @@ -1022,9 +1022,19 @@ static bool compileAndLoadModule(Compile >>> case llvm::LockFileManager::LFS_Shared: >>> // Someone else is responsible for building the module. Wait for >>> them to >>> // finish. >>> - if (Locked.waitForUnlock() == llvm::LockFileManager::Res_OwnerDied) >>> + switch (Locked.waitForUnlock()) { >>> + case llvm::LockFileManager::Res_Success: >>> + ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>> + break; >>> + case llvm::LockFileManager::Res_OwnerDied: >>> continue; // try again to get the lock. >>> - ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>> + case llvm::LockFileManager::Res_Timeout: >>> + Diags.Report(ModuleNameLoc, diag::err_module_lock_timeout) >>> + << Module->Name; >>> + // Clear the lock file so that future invokations can make >>> progress. >>> + Locked.unsafeRemoveLockFile(); >>> + return false; >>> + } >>> break; >>> } >>> >>> >>> >>> _______________________________________________ >>> cfe-commits mailing list >>> [email protected] <mailto:[email protected]> >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits> >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
