Pádraig Brady wrote: ... > I still get bad performance for the above with SUBTHREAD_LINES_HEURISTIC=128K
Sorry I haven't had time for this today. I'll investigate tomorrow. > So as you suggested, the large mem allocation when reading from a pipe > is a problem, > and in fact seems to be the main problem. Now given the memory isn't > actually used > it shouldn't be a such an issue, but if one has MALLOC_PERTURB_ set, > then it is used, > and it has a huge impact. Compare: > > $ for i in $(seq 33); do seq 88| MALLOC_PERTURB_= timeout 2 sort > --para=1 >/dev/null & done > $ for i in $(seq 33); do seq 88| MALLOC_PERTURB_=1 timeout 2 sort > --para=1 >/dev/null & done Good point! > So we should be more conservative in memory allocation in sort, > and be more aligned with CPU cache sizes than RAM sizes I suspect. > This will be an increasing problem as we tend to run more in ||. > It would be interesting I think to sort first by L1 cache size, > then by L2, etc, but as a first pass, a more sensible default > of 8MB or so seems appropriate. > > As a general note, MALLOC_PERTURB_ should be unset when benchmarking > anything to do with `sort`