With address rewriting:

03/13/17 16:19:28 Query info: matched=127342; skipped=0; query_time=1.017584; 
send_time=64.020788; type=MachinePrivate; requirements={true}; locate=0; 
peer=<XXXXXX:30778>; projection={}
03/13/17 16:23:31 Query info: matched=56398; skipped=83800; 
query_time=8.167287; send_time=49.315923; type=Any; requirements={(((MyType == 
"Scheduler") || (MyType == "Submitter")) || (((MyType == "Machine") && ( 
!regexp("^(T1_|T2_US_)",GLIDEIN_CMSSite) && ((false ? true : (Cpus > 0)) && 
(time() < GLIDEIN_ToRetire))))))}; locate=0; peer=<XXXXXX:20175>; projection={}

Without address rewriting:

03/13/17 16:22:25 Query info: matched=128000; skipped=0; query_time=0.913494; 
send_time=29.316582; type=MachinePrivate; requirements={true}; locate=0; 
peer=<XXXXXX:8604>; projection={}
03/13/17 16:24:54 Query info: matched=45781; skipped=94684; 
query_time=8.199538; send_time=33.565879; type=Any; requirements={(((MyType == 
"Scheduler") || (MyType == "Submitter")) || (((MyType == "Machine") && 
(regexp("^T2_US_",GLIDEIN_CMSSite) && ((false ? true : (Cpus > 0)) && (time() < 
GLIDEIN_ToRetire))))))}; locate=0; peer=<XXXXXXX:12012>; projection={}

Seems we have more room for improvement, but this helps!

Brian

> On Mar 13, 2017, at 10:14 AM, Brian Bockelman <[email protected]> wrote:
> 
>> 
>> On Mar 13, 2017, at 9:31 AM, Brian Bockelman <[email protected]> wrote:
>> 
>> Hi Greg,
>> 
>> I would be OK if the collector never handled a query inline - either doing 
>> an EAGAIN or just queuing them up internally for some finite amount of time.
>> 
>> In this case, the query was inlined because the child processes pushed the 
>> collector into swap.  Once swap was involved and memory operations took 
>> seconds, well, the whole thing kind of went to hell until the master shot 
>> the involved processes.
>> 
>> Going in a different direction, I tried a "poor man's profiling" (pstack 
>> repeatedly) and found lots of places to potentially squeeze out performance.
>> 
>> - ConvertDefaultIPToSocketIP is called twice for every attribute in the ad.  
>> This does a bunch of string manipulation, but _also_ does at least one 
>> param_boolean call per invocation.  Nothing like doing this a few million 
>> times per query:
>> 
>> #0  0x00007f85963e0f18 in next_config_macro () from 
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #1  0x00007f8596418803 in expand_macro(char const*, macro_set&, 
>> macro_eval_context&) () from /usr/lib64/libcondor_utils_8_7_0.so
>> #2  0x00007f85964e3317 in param_ctx () from 
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #3  0x00007f85964e4749 in param () from /usr/lib64/libcondor_utils_8_7_0.so
>> #4  0x00007f85964e4a8e in param_boolean(char const*, bool, bool, 
>> compat_classad::ClassAd*, compat_classad::ClassAd*, bool) () from 
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #5  0x00007f85963ea346 in ConvertDefaultIPToSocketIP(char const*, 
>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, 
>> Stream&) () from /usr/lib64/libcondor_utils_8_7_0.so
>> 
>> - GetInternalReferences looks surprisingly expensive.  Since the negotiator 
>> typically whitelists attributes for claimed ads, is there room for 
>> improvement?
>> 
>> - While it obviously showed up, I was surprised by how small of a percentage 
>> of the time was spent in classad::ClassAdUnParser::Unparse.  Maybe 30%? 50%?
>> 
>> - prepare_crypto_for_secret_is_noop() involves a modest amount of string 
>> parsing, is called once per attribute, yet is a property of the connection 
>> itself.  It only needs to be called once per invocation of putClassAd - 
>> perhaps can be cached too?
>> 
>> - The std::string buffered output is released / allocated between each 
>> attribute.  It might be reasonable to recycle the allocation on the heap and 
>> reserve a modest amount of memory (1KB?)?
>> 
>> - There's a bit of slop in the way we evaluate string values.  Consider the 
>> following traceback:
>> 
>> #1  0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) () 
>> from /usr/lib64/libclassad.so.9
>> #2  0x00007f8595e93580 in classad::Literal::_Evaluate(classad::EvalState&, 
>> classad::Value&) const () from /usr/lib64/libclassad.so.9
>> #3  0x00007f8595e82501 in classad::ExprTree::Evaluate(classad::EvalState&, 
>> classad::Value&) const () from /usr/lib64/libclassad.so.9
>> #4  0x00007f8595e66ea0 in 
>> classad::ClassAd::EvaluateAttr(std::basic_string<char, 
>> std::char_traits<char>, std::allocator<char> > const&, classad::Value&) 
>> const () from /usr/lib64/libclassad.so.9
>> #5  0x00007f8595e66fe9 in 
>> classad::ClassAd::EvaluateAttrString(std::basic_string<char, 
>> std::char_traits<char>, std::allocator<char> > const&, 
>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) 
>> const () from /usr/lib64/libclassad.so.9
>> 
>> I think we copy the string twice time: once from the literal to the value, 
>> then from the value to the user-provided string (which is reallocated for 
>> each attribute). It would be reasonable to short-circuit this (check to see 
>> if the attribute's expression / cached expression is a literal string type, 
>> then make a copy directly to the user buffer); it might be possible to avoid 
>> it in all cases.  Given that we have const-ness all around, we could take a 
>> step further and introduce a EvaluateLiteralString that does zero-copy for 
>> strings.
>> 
> 
> There's another twist on this problem in Unparse:
> 
> #0  0x00007f859513d258 in std::basic_string<char, std::char_traits<char>, 
> std::allocator<char> >::basic_string(std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&) () from 
> /usr/lib64/libstdc++.so.6
> #1  0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) () 
> from /usr/lib64/libclassad.so.9
> #2  0x00007f8595e9d647 in 
> classad::ClassAdUnParser::Unparse(std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> >&, classad::ExprTree const*) () 
> from /usr/lib64/libclassad.so.9
> #3  0x00007f85964da5d6 in _putClassAd(Stream*, classad::ClassAd&, int) () 
> from /usr/lib64/libcondor_utils_8_7_0.so
> #4  0x00007f85964daa78 in putClassAd(Stream*, classad::ClassAd&, int, 
> std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char> 
> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > > > const*) () from 
> /usr/lib64/libcondor_utils_8_7_0.so
> 
> We make a copy of the classad::Value (memory allocation, copy the string) in 
> `Unparse`, then make a copy of the string (memory allocation, copy the 
> string), then copy the string to the user-provided buffer (memory allocation, 
> copy the string).  Oh - and finally, we make a copy in putClassAd to the 
> CEDAR socket buffer.
> 
> Seems TJ already discovered this when pulling the Factor out of the value, 
> but left the fix #ifdef'd out?
> 
> Brian
> 
> _______________________________________________
> HTCondor-devel mailing list
> [email protected]
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

_______________________________________________
HTCondor-devel mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

Reply via email to