On 10/9/2025 12:10 PM, Easwar Hariharan wrote:
> On 10/9/2025 11:53 AM, Nuno Das Neves wrote:
>> On 10/6/2025 4:08 PM, Easwar Hariharan wrote:
>>> Use a better mapping of hypervisor status codes to errno values and
>>> disambiguate the catch-all -EIO value. While here, remove the duplicate
>>> INVALID_LP_INDEX and INVALID_REGISTER_VALUES hypervisor status entries.
>>>
>>
>> To be honest, in retrospect the idea of 'translating' the hypercall error
>> codes is a bit pointless. hv_result_to_errno() allows the hypercall helper
>> functions to be a bit cleaner, but that's about it. When debugging you
>> almost always want to know the actual hypercall error code. Translating
>> it imperfectly is often useless, and at worst creates a red
>> herring/obfuscates the true source of the error.
> 
> I feel like you're thinking from the perspective of Microsoft engineers 
> working with
> the hypervisor. IMHO, the translation is useful for the rest of the kernel to 
> understand
> and possibly handle differently the possible errors. EBUSY for example is
> meant to indicate that the operation is already done, or that the target 
> (device/hypervisor)
> is busy with something else. Timeouts may be handled by a retry, perhaps with 
> a backoff period.

In your patch you've mapped EBUSY to HV_STATUS_NOT_ACKNOWLEDGED and
HV_STATUS_VTL_ALREADY_ENABLED. Neither of those can be handled by kernel code
that isn't aware of the hypervisor. They are logic bugs that just need to be
surfaced and fixed by someone familiar with the related code.

This is the case for most hypercall errors: they indicate a bug and we need to
surface the error and fix the code, not handle it at runtime.

The few examples (in the mshv_root driver) where a hypercall error is handled at
runtime are:
1. HV_STATUS_INSUFFICIENT_MEMORY where pages are deposited.
2. HV_STATUS_NO_RESOURCES which happens on withdrawing pages when there's none
   left. That is treated as a success.
3. HV_STATUS_CALL_PENDING means waiting for a completion on an async hypercall

All of these are context-specific - the errors can only be handled there.
>>
>> With that in mind, updating the errno mappings to be more accurate feels
>> like unnecessary churn. It might even be better to remove the errno mappings
>> altogether and just translate HV_STATUS_SUCCESS to 0 and any other error
>> to -EIO or some other 'signal' error code to make it more obvious that
>> a *hypercall* error occurred and not some other Linux error. We'd still
>> want to keep the table in some form because it's also used for the error
>> strings.
> 
> IMHO, an arbitrarily chosen 'signal' error code wouldn't tell support or the 
> rest of the community that
> a hypercall error occurred, without a print or trace closest to the hypercall 
> location.
> 

The other Linux error codes don't either. I agree we should almost always have a
debug or error print right after the hypercall fails.

>>
>> The cleanup removing the duplicates in the table is welcome.
>>
>> Nuno
>>
> <snip>


Reply via email to