With address rewriting:
03/13/17 16:19:28 Query info: matched=127342; skipped=0; query_time=1.017584;
send_time=64.020788; type=MachinePrivate; requirements={true}; locate=0;
peer=<XXXXXX:30778>; projection={}
03/13/17 16:23:31 Query info: matched=56398; skipped=83800;
query_time=8.167287; send_time=49.315923; type=Any; requirements={(((MyType ==
"Scheduler") || (MyType == "Submitter")) || (((MyType == "Machine") && (
!regexp("^(T1_|T2_US_)",GLIDEIN_CMSSite) && ((false ? true : (Cpus > 0)) &&
(time() < GLIDEIN_ToRetire))))))}; locate=0; peer=<XXXXXX:20175>; projection={}
Without address rewriting:
03/13/17 16:22:25 Query info: matched=128000; skipped=0; query_time=0.913494;
send_time=29.316582; type=MachinePrivate; requirements={true}; locate=0;
peer=<XXXXXX:8604>; projection={}
03/13/17 16:24:54 Query info: matched=45781; skipped=94684;
query_time=8.199538; send_time=33.565879; type=Any; requirements={(((MyType ==
"Scheduler") || (MyType == "Submitter")) || (((MyType == "Machine") &&
(regexp("^T2_US_",GLIDEIN_CMSSite) && ((false ? true : (Cpus > 0)) && (time() <
GLIDEIN_ToRetire))))))}; locate=0; peer=<XXXXXXX:12012>; projection={}
Seems we have more room for improvement, but this helps!
Brian
> On Mar 13, 2017, at 10:14 AM, Brian Bockelman <[email protected]> wrote:
>
>>
>> On Mar 13, 2017, at 9:31 AM, Brian Bockelman <[email protected]> wrote:
>>
>> Hi Greg,
>>
>> I would be OK if the collector never handled a query inline - either doing
>> an EAGAIN or just queuing them up internally for some finite amount of time.
>>
>> In this case, the query was inlined because the child processes pushed the
>> collector into swap. Once swap was involved and memory operations took
>> seconds, well, the whole thing kind of went to hell until the master shot
>> the involved processes.
>>
>> Going in a different direction, I tried a "poor man's profiling" (pstack
>> repeatedly) and found lots of places to potentially squeeze out performance.
>>
>> - ConvertDefaultIPToSocketIP is called twice for every attribute in the ad.
>> This does a bunch of string manipulation, but _also_ does at least one
>> param_boolean call per invocation. Nothing like doing this a few million
>> times per query:
>>
>> #0 0x00007f85963e0f18 in next_config_macro () from
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #1 0x00007f8596418803 in expand_macro(char const*, macro_set&,
>> macro_eval_context&) () from /usr/lib64/libcondor_utils_8_7_0.so
>> #2 0x00007f85964e3317 in param_ctx () from
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #3 0x00007f85964e4749 in param () from /usr/lib64/libcondor_utils_8_7_0.so
>> #4 0x00007f85964e4a8e in param_boolean(char const*, bool, bool,
>> compat_classad::ClassAd*, compat_classad::ClassAd*, bool) () from
>> /usr/lib64/libcondor_utils_8_7_0.so
>> #5 0x00007f85963ea346 in ConvertDefaultIPToSocketIP(char const*,
>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >&,
>> Stream&) () from /usr/lib64/libcondor_utils_8_7_0.so
>>
>> - GetInternalReferences looks surprisingly expensive. Since the negotiator
>> typically whitelists attributes for claimed ads, is there room for
>> improvement?
>>
>> - While it obviously showed up, I was surprised by how small of a percentage
>> of the time was spent in classad::ClassAdUnParser::Unparse. Maybe 30%? 50%?
>>
>> - prepare_crypto_for_secret_is_noop() involves a modest amount of string
>> parsing, is called once per attribute, yet is a property of the connection
>> itself. It only needs to be called once per invocation of putClassAd -
>> perhaps can be cached too?
>>
>> - The std::string buffered output is released / allocated between each
>> attribute. It might be reasonable to recycle the allocation on the heap and
>> reserve a modest amount of memory (1KB?)?
>>
>> - There's a bit of slop in the way we evaluate string values. Consider the
>> following traceback:
>>
>> #1 0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) ()
>> from /usr/lib64/libclassad.so.9
>> #2 0x00007f8595e93580 in classad::Literal::_Evaluate(classad::EvalState&,
>> classad::Value&) const () from /usr/lib64/libclassad.so.9
>> #3 0x00007f8595e82501 in classad::ExprTree::Evaluate(classad::EvalState&,
>> classad::Value&) const () from /usr/lib64/libclassad.so.9
>> #4 0x00007f8595e66ea0 in
>> classad::ClassAd::EvaluateAttr(std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&, classad::Value&)
>> const () from /usr/lib64/libclassad.so.9
>> #5 0x00007f8595e66fe9 in
>> classad::ClassAd::EvaluateAttrString(std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&,
>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
>> const () from /usr/lib64/libclassad.so.9
>>
>> I think we copy the string twice time: once from the literal to the value,
>> then from the value to the user-provided string (which is reallocated for
>> each attribute). It would be reasonable to short-circuit this (check to see
>> if the attribute's expression / cached expression is a literal string type,
>> then make a copy directly to the user buffer); it might be possible to avoid
>> it in all cases. Given that we have const-ness all around, we could take a
>> step further and introduce a EvaluateLiteralString that does zero-copy for
>> strings.
>>
>
> There's another twist on this problem in Unparse:
>
> #0 0x00007f859513d258 in std::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::basic_string(std::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&) () from
> /usr/lib64/libstdc++.so.6
> #1 0x00007f8595ea6ca9 in classad::Value::CopyFrom(classad::Value const&) ()
> from /usr/lib64/libclassad.so.9
> #2 0x00007f8595e9d647 in
> classad::ClassAdUnParser::Unparse(std::basic_string<char,
> std::char_traits<char>, std::allocator<char> >&, classad::ExprTree const*) ()
> from /usr/lib64/libclassad.so.9
> #3 0x00007f85964da5d6 in _putClassAd(Stream*, classad::ClassAd&, int) ()
> from /usr/lib64/libcondor_utils_8_7_0.so
> #4 0x00007f85964daa78 in putClassAd(Stream*, classad::ClassAd&, int,
> std::set<std::basic_string<char, std::char_traits<char>, std::allocator<char>
> >, classad::CaseIgnLTStr, std::allocator<std::basic_string<char,
> std::char_traits<char>, std::allocator<char> > > > const*) () from
> /usr/lib64/libcondor_utils_8_7_0.so
>
> We make a copy of the classad::Value (memory allocation, copy the string) in
> `Unparse`, then make a copy of the string (memory allocation, copy the
> string), then copy the string to the user-provided buffer (memory allocation,
> copy the string). Oh - and finally, we make a copy in putClassAd to the
> CEDAR socket buffer.
>
> Seems TJ already discovered this when pulling the Factor out of the value,
> but left the fix #ifdef'd out?
>
> Brian
>
> _______________________________________________
> HTCondor-devel mailing list
> [email protected]
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
_______________________________________________
HTCondor-devel mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel