On Thu, 2018-04-05 at 21:19 -0700, Dan Williams wrote:
> ARS is an operation that can take 10s to 100s of seconds to find media
> errors that should rarely be present. If the platform crashes due to
> media errors in persistent memory, the expectation is that the BIOS will
> report those known errors in a 'short' ARS request.
> A 'short' ARS request asks platform firmware to return an ARS payload
> with all known errors, but without issuing a 'long' scrub. At driver
> init a short request is issued to all PMEM ranges before registering
> regions. Then, in the background, a long ARS is scheduled for each
> region.

I confirmed that this version addressed the WARN_ONCE issue.

> The ARS implementation is simplified to centralize ARS completion work
> in the ars_complete() helper called from ars_status_process_records().
> The timeout is removed since there is no facility to cancel ARS, and
> system init is never blocked waiting for a 'long' ARS. The ars_state
> flags are used to coordinate ARS requests from driver init, ARS requests
> from userspace, and ARS requests in response to media error
> notifications.

While I like the simplification of the code, I leaned that we need to
handle both cases below:
 1) No FW ARS Scan: ARS short scan and enable pmem devices without delay
(new behavior by this patch)
 2) FW ARS Scan: Wait for FW ARS scan to complete, and then enable pmem

Case 2) is still necessary because:

 - After a system crash in certain error scenario, FW may not be able to
obtain all error records and need ARS long scan to retrieve them.
 - Other OSes do not initiate an ARS long scan, and assume FW to start
it at POST when necessary.

Linux-nvdimm mailing list

Reply via email to