> On Feb 26, 2015, at 2:49 PM, Richard Smith <[email protected]> wrote:
> 
> On Thu, Feb 26, 2015 at 8:30 AM, Ben Langmuir <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Feb 25, 2015, at 6:20 PM, Richard Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On Tue, Feb 24, 2015 at 7:55 PM, Ben Langmuir <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> On Feb 24, 2015, at 6:57 PM, Richard Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> On Mon, Feb 9, 2015 at 12:35 PM, Ben Langmuir <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Author: benlangmuir
>>> Date: Mon Feb  9 14:35:13 2015
>>> New Revision: 228604
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=228604&view=rev 
>>> <http://llvm.org/viewvc/llvm-project?rev=228604&view=rev>
>>> Log:
>>> Diagnose timeouts in the LockFileManager and delete the dead lock file
>>> 
>>> If the lock file manager times out, we should give an error rather than
>>> silently trying to load the existing module.  And delete the
>>> (presumably) dead lock file, since it will otherwise prevent progress in
>>> future invokations. This is unsound since we have no way to prove that
>>> the lock file we are deleting is the same one we timed out on, but since
>>> the lock is only to avoid excessive rebuilding anyway it should be okay.
>>> Depends on llvm r228603.
>>> 
>>> This diagnostic fires a *lot* when, say, bootstrapping Clang with modules 
>>> enabled at -j16. Is the timeout perhaps too short?
>> 
>> I lowered the timeout to 1 minute in my llvm commit,
>> 
>> Well, technically, it went from 8m 44s to 65.5s due to the exponential 
>> backoff... :-/
> 
> Heh, good point.
> 
>> 
>> so if it helps, feel free to bump it back up.  I’m not particularly 
>> concerned about the actual number now that we will actually honour the 
>> limit, but the old value of 5 minutes seemed really long to me.
>> 
>> Really? Consider a Debug+Asserts clang, building a large module (such as the 
>> Clang_AST module). Requiring that to succeed in 65.5 seconds doesn't seem 
>> particularly reasonable (and, indeed, some such modules do not build in that 
>> environment in 65.5s on my machine, at least when it's heavily loaded).
> 
> That sucks.
> 
>> 
>> But I'm not really sure that a timeout is the right approach here at all. 
>> What we want to track is whether the creating process is still alive. 
>> Perhaps we should have that process periodically bump the timestamp on the 
>> lock file, and assume it's died if the timestamp is too far in the past? 
>> This'd mean that in the (scary) case of sharing a module cache across a 
>> build farm, you'd need your machines to not have too much clock skew, but 
>> that's already necessary for build systems that use timestamps to determine 
>> what to rebuild, so it doesn't seem like a ridiculous restriction.
>> 
>> Thoughts?
> 
> I think what you’re proposing is a better way to track whether the creator is 
> alive than what we currently do (looking at the hostname+pid).  We could 
> eliminate the clock-sync dependence if the waiting process set up its own 
> local timer and then reset it if it sees the timestamp of the lock file has 
> changed (ie. we don’t care what the value is, only whether it changes).  But 
> your suggestion is simpler and automatically handles a new process waiting on 
> a long-dead file in a good way.
> 
> At some level I like that the timeout gives you feedback when a compilation 
> takes much longer than expected.  How about this: in addition to tracking 
> process liveness, we also use a timeout.  When the timeout is reached, we 
> emit a warning that the module build is taking a long time.  We then forcibly 
> delete the lock file and try again to acquire the lock ourselves. This has 
> the added advantage of combatting priority inversion.  If we reach the 
> timeout again (maybe several times?), we emit an error and fail.  What do you 
> think? 
> 
> For the most part, this sounds great to me. I'm not entirely convinced that 
> it makes sense to remove the lock and retry if we can see that another 
> process is periodically touching it.

Okay, but right now any time we delete a lock file we may accidentally delete a 
valid lock because of races.  I haven’t found a good way to avoid the window 
between checking a lock file is invalid and deleting that file (which is done 
by name).

> I think issuing a remark after some timeout (1 minute, maybe?), and possibly 
> failing after a longer timeout, may be reasonable.

I was thinking a warning, but I guess -Werror makes that a bad idea.  Failing 
instead of clearing the lock file is okay with me.

> 
> We'd need a mechanism to turn off the second timeout (for cases where modules 
> simply take a long time to build for whatever reason) and possibly to turn 
> off the first timeout (to ensure we provide deterministic output).

-Rmodule-timeout-soft
-Wmodule-timeout (DefaultError)

We could theoretically make a -W group that included both for convenience, 
although that might be weird with a remark...

Another question: where do you think we should touch the file? In a consumer of 
top-level decls?

> 
>>> Modified:
>>>     cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td
>>>     cfe/trunk/lib/Frontend/CompilerInstance.cpp
>>> 
>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td
>>> URL: 
>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff
>>>  
>>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td?rev=228604&r1=228603&r2=228604&view=diff>
>>> ==============================================================================
>>> --- cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td (original)
>>> +++ cfe/trunk/include/clang/Basic/DiagnosticCommonKinds.td Mon Feb  9 
>>> 14:35:13 2015
>>> @@ -83,6 +83,8 @@ def err_module_not_found : Error<"module
>>>  def err_module_not_built : Error<"could not build module '%0'">, 
>>> DefaultFatal;
>>>  def err_module_lock_failure : Error<
>>>    "could not acquire lock file for module '%0'">, DefaultFatal;
>>> +def err_module_lock_timeout : Error<
>>> +  "timed out waiting to acquire lock file for module '%0'">, DefaultFatal;
>>>  def err_module_cycle : Error<"cyclic dependency in module '%0': %1">,
>>>    DefaultFatal;
>>>  def note_pragma_entered_here : Note<"#pragma entered here">;
>>> 
>>> Modified: cfe/trunk/lib/Frontend/CompilerInstance.cpp
>>> URL: 
>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff
>>>  
>>> <http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInstance.cpp?rev=228604&r1=228603&r2=228604&view=diff>
>>> ==============================================================================
>>> --- cfe/trunk/lib/Frontend/CompilerInstance.cpp (original)
>>> +++ cfe/trunk/lib/Frontend/CompilerInstance.cpp Mon Feb  9 14:35:13 2015
>>> @@ -1022,9 +1022,19 @@ static bool compileAndLoadModule(Compile
>>>      case llvm::LockFileManager::LFS_Shared:
>>>        // Someone else is responsible for building the module. Wait for 
>>> them to
>>>        // finish.
>>> -      if (Locked.waitForUnlock() == llvm::LockFileManager::Res_OwnerDied)
>>> +      switch (Locked.waitForUnlock()) {
>>> +      case llvm::LockFileManager::Res_Success:
>>> +        ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate;
>>> +        break;
>>> +      case llvm::LockFileManager::Res_OwnerDied:
>>>          continue; // try again to get the lock.
>>> -      ModuleLoadCapabilities |= ASTReader::ARR_OutOfDate;
>>> +      case llvm::LockFileManager::Res_Timeout:
>>> +        Diags.Report(ModuleNameLoc, diag::err_module_lock_timeout)
>>> +            << Module->Name;
>>> +        // Clear the lock file so that future invokations can make 
>>> progress.
>>> +        Locked.unsafeRemoveLockFile();
>>> +        return false;
>>> +      }
>>>        break;
>>>      }
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> cfe-commits mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits 
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits>
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to