> On Feb 26, 2015, at 3:38 PM, Richard Smith <[email protected]> wrote: > > On Thu, Feb 26, 2015 at 3:17 PM, Ben Langmuir <[email protected] > <mailto:[email protected]>> wrote: >> On Feb 26, 2015, at 2:49 PM, Richard Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> On Thu, Feb 26, 2015 at 8:30 AM, Ben Langmuir <[email protected] >> <mailto:[email protected]>> wrote: >> >>> On Feb 25, 2015, at 6:20 PM, Richard Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> On Tue, Feb 24, 2015 at 7:55 PM, Ben Langmuir <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> On Feb 24, 2015, at 6:57 PM, Richard Smith <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> On Mon, Feb 9, 2015 at 12:35 PM, Ben Langmuir <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Author: benlangmuir >>>> Date: Mon Feb 9 14:35:13 2015 >>>> New Revision: 228604 >>>> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=228604&view=rev >>>> <http://llvm.org/viewvc/llvm-project?rev=228604&view=rev> >>>> Log: >>>> Diagnose timeouts in the LockFileManager and delete the dead lock file >>>> >>>> If the lock file manager times out, we should give an error rather than >>>> silently trying to load the existing module. And delete the >>>> (presumably) dead lock file, since it will otherwise prevent progress in >>>> future invokations. This is unsound since we have no way to prove that >>>> the lock file we are deleting is the same one we timed out on, but since >>>> the lock is only to avoid excessive rebuilding anyway it should be okay. >>>> Depends on llvm r228603. >>>> >>>> This diagnostic fires a *lot* when, say, bootstrapping Clang with modules >>>> enabled at -j16. Is the timeout perhaps too short? >>> >>> I lowered the timeout to 1 minute in my llvm commit, >>> >>> Well, technically, it went from 8m 44s to 65.5s due to the exponential >>> backoff... :-/ >> >> Heh, good point. >> >>> >>> so if it helps, feel free to bump it back up. I’m not particularly >>> concerned about the actual number now that we will actually honour the >>> limit, but the old value of 5 minutes seemed really long to me. >>> >>> Really? Consider a Debug+Asserts clang, building a large module (such as >>> the Clang_AST module). Requiring that to succeed in 65.5 seconds doesn't >>> seem particularly reasonable (and, indeed, some such modules do not build >>> in that environment in 65.5s on my machine, at least when it's heavily >>> loaded). >> >> That sucks. >> >>> >>> But I'm not really sure that a timeout is the right approach here at all. >>> What we want to track is whether the creating process is still alive. >>> Perhaps we should have that process periodically bump the timestamp on the >>> lock file, and assume it's died if the timestamp is too far in the past? >>> This'd mean that in the (scary) case of sharing a module cache across a >>> build farm, you'd need your machines to not have too much clock skew, but >>> that's already necessary for build systems that use timestamps to determine >>> what to rebuild, so it doesn't seem like a ridiculous restriction. >>> >>> Thoughts? >> >> I think what you’re proposing is a better way to track whether the creator >> is alive than what we currently do (looking at the hostname+pid). We could >> eliminate the clock-sync dependence if the waiting process set up its own >> local timer and then reset it if it sees the timestamp of the lock file has >> changed (ie. we don’t care what the value is, only whether it changes). But >> your suggestion is simpler and automatically handles a new process waiting >> on a long-dead file in a good way. >> >> At some level I like that the timeout gives you feedback when a compilation >> takes much longer than expected. How about this: in addition to tracking >> process liveness, we also use a timeout. When the timeout is reached, we >> emit a warning that the module build is taking a long time. We then >> forcibly delete the lock file and try again to acquire the lock ourselves. >> This has the added advantage of combatting priority inversion. If we reach >> the timeout again (maybe several times?), we emit an error and fail. What >> do you think? >> >> For the most part, this sounds great to me. I'm not entirely convinced that >> it makes sense to remove the lock and retry if we can see that another >> process is periodically touching it. > > Okay, but right now any time we delete a lock file we may accidentally delete > a valid lock because of races. I haven’t found a good way to avoid the > window between checking a lock file is invalid and deleting that file (which > is done by name). > > So long as we're conservatively correct in the case of a race, and the > probability is low enough, I don't think this is a huge problem (IIUC, we > should get a signature mismatch in any downstream modules if this goes wrong, > and end up triggering a bunch of additional rebuilds, but we should converge > to a stable set of module files in the end). The proposed direction reduces > the probability of a race (we only delete the file and retry if the producing > end appears to not be updating it), so it seems like an improvement.
Right. > > I think this is just part of the cost we pay by providing an implicit module > building system and a transparent upgrade path; we do also provide the > ability to explicitly build modules, which avoids all these issues. (That > said, we don't yet have a non-cc1 way of building a module explicitly, and we > really should add one.) >> I think issuing a remark after some timeout (1 minute, maybe?), and possibly >> failing after a longer timeout, may be reasonable. > > I was thinking a warning, but I guess -Werror makes that a bad idea. Failing > instead of clearing the lock file is okay with me. > >> >> We'd need a mechanism to turn off the second timeout (for cases where >> modules simply take a long time to build for whatever reason) and possibly >> to turn off the first timeout (to ensure we provide deterministic output). > > -Rmodule-timeout-soft > -Wmodule-timeout (DefaultError) > > We could theoretically make a -W group that included both for convenience, > although that might be weird with a remark... > > We want to bail out of compilation when we hit the timeout, so a -W flag > doesn't seem like the right model to me (it would need to control more than > merely whether we emit the diagnostic). How about > -fimplicit-modules-build-timeout=<time in s>, following the usual convention > that 0 means "no limit”? Sounds good. If we’re going to customize the timeout, it would be nice if the remark timeout is relative to the error timeout. Maybe a 1:5 ratio, with a default of 1 or 2 minutes for the remark (so 5 or 10 minutes for the error). I’m not sure how well that would come across in the -help text for this option though... > > Another question: where do you think we should touch the file? In a consumer > of top-level decls? > > An ASTConsumer could be a reasonable place. That should also let us watch for > template instantiations performed at end of TU, which are sometimes a > sizeable portion of the parsing time. Sounds good. > >>>> Modified: >>>> cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>>> cfe/trunk/lib/Frontend/CompilerInstance.cpp >>>> >>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff >>>> >>>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff> >>>> ============================================================================== >>>> --- cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td (original) >>>> +++ cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td Mon Feb 9 >>>> 14:35:13 2015 >>>> @@ -83,6 +83,8 @@ def err_module_not_found : Error<"module >>>> def err_module_not_built : Error<"could not build module '%0'">, >>>> DefaultFatal; >>>> def err_module_lock_failure : Error< >>>> "could not acquire lock file for module '%0'">, DefaultFatal; >>>> +def err_module_lock_timeout : Error< >>>> + "timed out waiting to acquire lock file for module '%0'">, DefaultFatal; >>>> def err_module_cycle : Error<"cyclic dependency in module '%0': %1">, >>>> DefaultFatal; >>>> def note_pragma_entered_here : Note<"#pragma entered here">; >>>> >>>> Modified: cfe/trunk/lib/Frontend/CompilerInstance.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff >>>> >>>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff> >>>> ============================================================================== >>>> --- cfe/trunk/lib/Frontend/CompilerInstance.cpp (original) >>>> +++ cfe/trunk/lib/Frontend/CompilerInstance.cpp Mon Feb 9 14:35:13 2015 >>>> @@ -1022,9 +1022,19 @@ static bool compileAndLoadModule(Compile >>>> case llvm::LockFileManager::LFS_Shared: >>>> // Someone else is responsible for building the module. Wait for >>>> them to >>>> // finish. >>>> - if (Locked.waitForUnlock() == llvm::LockFileManager::Res_OwnerDied) >>>> + switch (Locked.waitForUnlock()) { >>>> + case llvm::LockFileManager::Res_Success: >>>> + ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>>> + break; >>>> + case llvm::LockFileManager::Res_OwnerDied: >>>> continue; // try again to get the lock. >>>> - ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate; >>>> + case llvm::LockFileManager::Res_Timeout: >>>> + Diags.Report(ModuleNameLoc, diag::err_module_lock_timeout) >>>> + << Module->Name; >>>> + // Clear the lock file so that future invokations can make >>>> progress. >>>> + Locked.unsafeRemoveLockFile(); >>>> + return false; >>>> + } >>>> break; >>>> } >>>> >>>> >>>> >>>> _______________________________________________ >>>> cfe-commits mailing list >>>> [email protected] <mailto:[email protected]> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits>
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
