On 6/11/26 08:51, bob prohaska wrote:
> On Tue, Jun 09, 2026 at 08:22:02AM -0700, Mark Millard wrote:
>> On 6/8/26 20:48, bob prohaska wrote:
>>> Lately a Pi2B running buildworld reported an
>>> exhaustion of swap, but buildworld kept running
>>> and seemingly finished successfully.
>>>
>>> The report came on the serial console, I didn't
>>> find anything in the buildworld log. 
>>>
>>> This seems a very great improvement. Swap exhaustion
>>> differs from other sorts of failure, in that one can 
>>> simply re-try the job with some hope of success when 
>>> the workload is lighter.
>>>
>>> Am I interpreting this correctly?
>>
>> [Because the actual messages are not reported, I'm making some
>> assumptions about the exact messages that you got.]
>>
>>
>> Remember vm.pageout_oom_seq ?
>>
> 
> Yes, /boot/loader.conf contains:
> vm.pageout_oom_seq="4096"
> vm.pfault_oom_attempts="3"
> #vm.pfault_oom_attempts="120"
> vm.pfault_oom_wait="20"
> 
> I'll admit to not remembering how 4096 was chosen....
> probably just a wild guess.
> 
>> The larger that value used, the longer the system operates with the
>> amount of free RAM below the target threshold: in other words, it makes
>> more tries at getting to the threshold before giving up and starting to
>> kill processes to get the free RAM.
>>
>> Running out of swap of itself just means that SWAP can not be used to
>> gain free RAM when such is not essential. RAM+SWAP can still be
>> (marginally) sufficient over such a time if no memory allocations
>> actually fail. If sufficient RAM/SWAP ends up being freed before
>> vm.pageout_oom_seq related kills happen, no overall failure happens.
>>
>>
>> As for the messages as I understand them:
>>
>> kernel: swap_pager: out of swap space
>>
>> does not report a failure, just a limiting condition.
>>
>> By contrast:
>>
>> kernel: swp_pager_getswapspace(2): failed
>>
>> reports a failure: the swap space allocation was necessary. It normally
>> nleads to the likes of:
>>
>> kernel: pid ??? (???), jid ???, uid ???, was killed: failed to reclaim
>> memory
> 
> A more recent incident reported in /var/log/messages:
> 
> Jun  4 12:34:39 www kernel: swap_pager: out of swap space
> Jun  4 12:34:39 www kernel: swp_pager_getswapspace(12): failed
> 
> but wasn't followed  by a "...was killed..." message.

Interesting. I've not had that combination as far as I know. Now I know
it is possible. Thanks.

> 
> Eventually there appeared what look like repeated disk errors, ending with:
> 
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Info: 0
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Retrying command (per 
> sense 
> data)
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 04 
> 63 3
> 4 50 00 00 18 00 
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): CAM status: SCSI Status 
> Erro
> r
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): SCSI status: Check 
> Condition
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR 
> asc
> :10,0 (ID CRC or ECC error)

The above looks like reporting of a drive problem. Getting to be time
for a replacement?

> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Info: 0
> Jun 11 02:04:59 www kernel: (da0:umass-sim0:0:0:0): Retrying command (per 
> sense data)
> 
> which ended in a debugger prompt on the console. 
> 
> There was considerable network
> activity around the same time 
> which resembled an ssh attack.

I'd guess that such was not likely to contribute to a false "MEDIUM
ERROR" with "(ID CRC or ECC error)".

> 
> The machine rebooted without incident, buildworld has been resumed with -j3.
> 
> If it happens again I'll save a backtrace if it'll be of interest.
> 



-- 
===
Mark Millard
marklmi at yahoo.com

Reply via email to