GuoyuYin opened a new issue, #15178:
URL: https://github.com/apache/nuttx/issues/15178

   ### Description / Steps to reproduce the issue
   
   ## Expected Behavior
   
   The program should execute without any memory access violations or crashes. 
Specifically, the `nxsem_trywait` function should correctly attempt to wait on 
a semaphore without causing system instability.
   
   ## Actual Behavior
   
   Instead of executing normally, the program crashes due to a load access 
fault when trying to read a half-word from what seems to be an invalid or 
inaccessible memory location during the execution of the `nxsem_trywait` 
function. The crash happens at the address `0x800078da`.
   
   ## Description
   
   Instead of executing normally, the program crashes due to a load access 
fault when trying to read a half-word from what seems to be an invalid or 
inaccessible memory location during the execution of the `nxsem_trywait` 
function. The crash happens at the address `0x800078da`.
   
   - **MCAUSE:** 0x5 (Indicates a load access fault)
   - **EPC:** 0x800078da (Program counter at the time of exception, within 
`nxsem_trywait`)
     0x800078da (Program counter on exceptions in `nxsem_trywait`)
   - **MTVAL:** 0x2 (Value related to the exception, possibly the address 
offset)
     MTVAL: 0x2 (The value associated with the exception, possibly the address 
offset)
   
   Querying the nut-img file reveals that the exception happened during the 
execution of the `nxsem_trywait` function, specifically at offset +0x54 from 
its start. The error log indicates that the system was attempting to perform a 
load half-word unsigned (lhu) operation from an address held in register s0, 
which appears to be causing the issue.
   
   The assembly instruction at this location is `lhu a1,0(s0)`, which tries to 
load an unsigned half-word into register a1 from the address pointed to by s0. 
Given that MTVAL contains `0x2`, it suggests that there might be an alignment 
issue or that the address being accessed does not exist or there is no valid 
mapping for it in the page tables, leading to the access fault.
   
   ## Debug Logs
   
   The debug logs show that the system was executing various system calls 
before encountering the exception. These calls are executed sequentially by 
number (call_num) until finally a very large number #call_num = 
18446744073709551615, which is the maximum value of an unsigned 64-bit integer 
that could mean an overflow or invalid argument.Upon reaching the 
`nxsem_trywait` function, it attempted to load a half-word from an address 
stored in s0. However, this address appears to be invalid or out of bounds, 
resulting in the load access fault.
   
   ```
   syz_sem_timedwait(0x2, 0x0)
   
   #call_num = 36
   # in post call
   
   riscv_exception: BUG: EXCEPTION: Load access fault. MCAUSE: 
0000000000000005, EPC: 00000000800078da, MTVAL: 0000000000000002
   
   VM DIAGNOSIS:
   08:00:39  Registers:
   Failed readin g regs: dial tcp 127.0.0.1:16715: connect: connection refused
   Failed readin g regs: dial tcp 127.0.0.1:16715: connect: connection refused
   ```
   
   ## Steps to Reproduce
   
   To reproduce this issue, one can use Syzkaller to execute system calls 
against the NuttX kernel. The specific sequence leading up to the crash 
includes calls such as `syz_sem_timedwait`, `syz_putenv`, `syz_setenv`, and 
`syz_sem_timedwait`, culminating in the problematic call to `nxsem_trywait`.
   
   The corresponding syscall specific implementation code is as follows:
   
   ```
   //lab: H spec
   #include "nuttx/sched.h"
   // #include "sched/sched.h"
   #include <sys/types.h>
   #include <sched.h>
   #include <time.h>
   #include <signal.h>
   #include <errno.h>
   #include <sys/time.h>
   #include "nuttx/irq.h"
   #include "mqueue.h"
   #include <fcntl.h>
   #include <sys/stat.h>
   #include <semaphore.h>
   // #include <netinet/in.h>
   // #include <arpa/inet.h>
   
   #include <arpa/inet.h>
   #include <netinet/in.h>
   #include <netinet/tcp.h>
   
   static long syz_sem_timedwait(volatile long sem_ptr, volatile long 
abstime_ptr)
   {
       sem_t *sem = (sem_t *)sem_ptr;
       const struct timespec *abstime = (const struct timespec *)abstime_ptr;
       return (long)sem_timedwait(sem, abstime);
   }
   ```
   
   ## Suggested Fix
   
   
   
   The nxsem_trywait function is in sched/semaphore/sem_trywait.c with the 
following code:
   
   ```
   int nxsem_trywait(FAR sem_t *sem)
   {
     /* This API should not be called from the idleloop */
   
     DEBUGASSERT(sem != NULL);
     DEBUGASSERT(!OSINIT_IDLELOOP() || !sched_idletask() ||
                 up_interrupt_context());
   
     /* If this is a mutex, we can try to get the mutex in fast mode,
      * else try to get it in slow mode.
      */
   
   #if !defined(CONFIG_PRIORITY_INHERITANCE) && 
!defined(CONFIG_PRIORITY_PROTECT)
     if (sem->flags & SEM_TYPE_MUTEX)
       {
         int32_t old = 1;
         if (atomic_try_cmpxchg_acquire(NXSEM_COUNT(sem), &old, 0))
           {
             return OK;
           }
   
         return -EAGAIN;
       }
   #endif
   
     return nxsem_trywait_slow(sem);
   }
   ```
   
   After analysis, the following recommendations were given
   
   **Pointer Validation:** Ensure that the pointer passed to `nxsem_trywait` is 
properly initialized and aligned according to RISC-V's requirements. Check if 
the semaphore structure (`sem`) is corrupted or if there are issues with memory 
allocation or deallocation patterns that could lead to accessing invalid 
addresses.
   
   **Semaphore Initialization:** Verify that all semaphores are correctly 
initialized before being used. Uninitialized or improperly initialized 
semaphores can cause undefined behavior, including invalid memory accesses.
   
   **Memory Alignment:** Review the memory alignment of data structures used in 
conjunction with `nxsem_trywait`. Misaligned data can result in access faults 
on architectures like RISC-V, which have strict alignment requirements.
   
   **Runtime Checks:** We can see that the problem is a load access fault when 
trying to access sem->flags or NXSEM_COUNT(sem). This indicates that the 
incoming sem pointer may be invalid (e.g., NULL, uninitialized, or pointing to 
an illegal memory address). To fix this, we need to make sure that the sem 
pointer is properly initialized and points to a valid memory location before 
actually using it.
   
   ```
   int nxsem_trywait(FAR sem_t *sem)
   {
     /* This API should not be called from the idleloop */
   
     if (sem == NULL) {
       _err("ERROR: NULL semaphore pointer provided to nxsem_trywait\n");
       return -EINVAL;  // EINVAL indicates an invalid argument
     }
   
     DEBUGASSERT(sem != NULL);
     DEBUGASSERT(!OSINIT_IDLELOOP() || !sched_idletask() ||
                 up_interrupt_context());
   
     /* Ensure that the semaphore structure is valid before accessing it */
     if (!is_valid_semaphore(sem)) {
       _err("ERROR: Invalid semaphore structure\n");
       return -EFAULT;  // EFAULT indicates a bad address
     }
   
     /* If this is a mutex, we can try to get the mutex in fast mode,
      * else try to get it in slow mode.
      */
   
   #if !defined(CONFIG_PRIORITY_INHERITANCE) && 
!defined(CONFIG_PRIORITY_PROTECT)
     if (sem->flags & SEM_TYPE_MUTEX)
       {
         int32_t old = 1;
         if (atomic_try_cmpxchg_acquire(NXSEM_COUNT(sem), &old, 0))
           {
             return OK;
           }
   
         return -EAGAIN;
       }
   #endif
   
     return nxsem_trywait_slow(sem);
   }
   ```
   
   We introduce an auxiliary function is_valid_semaphore(), which is assumed to 
exist, to check if a semaphore structure is legal. This function can be defined 
on an implementation-specific basis, e.g., to check if the semaphore has been 
properly initialized, etc.
   
   ### On which OS does this issue occur?
   
   [OS: Linux]
   
   ### What is the version of your OS?
   
   Ubuntu 20.04
   
   ### NuttX Version
   
   2ff2b8243773df3a56d535635b898a06f3f8c0f6
   
   ### Issue Architecture
   
   [Arch: risc-v]
   
   ### Issue Area
   
   [Area: Kernel]
   
   ### Verification
   
   - [X] I have verified before submitting the report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to