Hi,

On 5/13/2025 5:05 PM, Jeff Hugo wrote:
> On 5/13/2025 3:53 AM, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> On 5/12/2025 9:49 PM, Jeff Hugo wrote:
>>> diff --git a/drivers/accel/qaic/qaic_ras.c b/drivers/accel/qaic/qaic_ras.c
>>> new file mode 100644
>>> index 000000000000..2f8c1f08dbc0
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic_ras.c
>>> @@ -0,0 +1,629 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
>>> +/* Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights 
>>> reserved. */
>>
>> 2025?
> 
> No, we haven't made any changes to this file this year, so our policy would 
> be to omit 2025.  Historically we've experienced that the community wants the 
> file markings like this to track any copyright prior to the inclusion of the 
> code into upstream/mainline, and then to rely on git metadata to track 
> copyright after inclusion. Therefore that is the policy we follow.  However, 
> these markings might be changing based the rest of your feedback.
> 
>>> +struct ras_data {
>>> +    /* header start */
>>> +    /* Magic number to validate the message */
>>> +    u16 magic;
>>> +    /* RAS version number */
>>> +    u16 ver;
>>> +    u32 seq_num;
>>> +    /* RAS message type */
>>> +    u8  type;
>>> +    u8  id;
>>> +    /* Size of RAS message without the header in byte */
>>> +    u16 len;
>>> +    /* header end */
>>> +    s32 result;
>>> +    /*
>>> +     * Error source
>>> +     * 0 : SoC Memory
>>> +     * 1 : PCIE
>>> +     * 2 : DDR
>>> +     * 3 : System Bus source 1
>>> +     * 4 : System Bus source 2
>>> +     * 5 : NSP Memory
>>> +     * 6 : Temperature Sensors
>>> +     */
>>> +    u32 source;
>>> +    /*
>>> +     * Stores the error type, there are three types of error in RAS
>>> +     * 0 : correctable error (CE)
>>> +     * 1 : uncorrectable error (UE)
>>> +     * 2 : uncorrectable error that is non-fatal (UE_NF)
>>> +     */
>>> +    u32 err_type;
>>> +    u32 err_threshold;
>>
>> This is unused. Maybe it could be useful?
> 
> The device can be configured to only make a RAS report to the host after a 
> threshold of events has occured - say every 10 DDR ECC events, report to the 
> host (qaic driver). This field basically restates what that configured limit 
> is. I suppose we can include it in the logged reports to signify that this 
> report really represents N incidents on the device.
> 
>>> +    case PCIE:
>>> +        pci_printk(level, qdev->pdev, "RAS 
>>> event.\nClass:%s\nDescription:%s %s %s\n",
>>> +               err_class_str[msg->err_type],
>>> +               err_type_str[msg->err_type],
>>> +               "error from",
>>> +               err_src_str[msg->source]);
>>> +
>>> +        switch (msg->err_type) {
>>> +        case CE:
>>> +            printk(KERN_WARNING pr_fmt("Syndrome:\n    Bad TLP count %d\n  
>>>   Bad DLLP count %d\n    Replay Rollover count %d\n    Replay Timeout count 
>>> %d\n    Recv Error count %d\n    Internal CE count %d\n"),
>>> +                   pcie_syndrome->bad_tlp,
>>> +                   pcie_syndrome->bad_dllp,
>>> +                   pcie_syndrome->replay_rollover,
>>> +                   pcie_syndrome->replay_timeout,
>>> +                   pcie_syndrome->rx_err,
>>> +                   pcie_syndrome->internal_ce_count);
>>
>> Why not pci_printk() that would be conistent with the rest of logging?
>> It there is a reson I would prefer pr_warn/pr_err style logs.
> 
> This is a special case. This is a continuation of the pci_printk() a few 
> lines up. If we do pci_printk() here, then the entire message gets broken up 
> is a weird way. In the middle of the report, you'll have the "header" that 
> pci_printk() adds (PCI device, driver, etc) repeted.
> 
> The way to avoid that would be to restructure this bit of the code to have 
> all the switches/ifs resolved, and have a single pci_printk() for the entire 
> decoded report.  That means we'll have a lot of duplicated code since the 
> common "report header" for the different PCIe reports would need to be 
> duplicated for each report variant.
> 
> This felt like the cleaner solution, although it does have its quirks.
> 
> Would a comment help?

Sure, comment would make this more readable. I would still consider using a 
single pci_printk() per event anyway because you could get these messages 
broken up if there is a lot of dmesg output from other places.
printk() takes a global lock in case there is VGA or serial console connected, 
so the messages wouldn't be split but it is up to you if you prefer cleaner 
code or cleaner logs.

>>> +static void qaic_ras_mhi_dl_xfer_cb(struct mhi_device *mhi_dev, struct 
>>> mhi_result *mhi_result)
>>> +{
>>> +    struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
>>> +    struct ras_data *msg = mhi_result->buf_addr;
>>> +    int ret;
>>> +
>>> +    if (mhi_result->transaction_status) {
>>> +        kfree(msg);
>>> +        return;
>>> +    }
>>> +
>>> +    ras_msg_to_cpu(msg);
>>> +    decode_ras_msg(qdev, msg);
>>> +
>>> +    ret = mhi_queue_buf(qdev->ras_ch, DMA_FROM_DEVICE, msg, sizeof(*msg), 
>>> MHI_EOT);
>>> +    if (ret) {
>>> +        dev_err(&mhi_dev->dev, "Cannot requeue RAS recv buf %d\n", ret);
>>> +        kfree(msg);
>>
>> Woudn't error here prevent any future messages from being received?
> 
> Sadly, yes. This should only happen if there is some issue with the 
> underlying PCIe link.
> 
>>> diff --git a/drivers/accel/qaic/qaic_ras.h b/drivers/accel/qaic/qaic_ras.h
>>> new file mode 100644
>>> index 000000000000..5df6cb9dae80
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic_ras.h
>>> @@ -0,0 +1,11 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>
>> Should be:
>> /* SPDX-License-Identifier: GPL-2.0-only */
>> or
>> // SPDX-License-Identifier: GPL-2.0-only
> 
> The "//" syntax is for C source files (foo.c) and this is a header file, so I 
> don't think that suggestion applies.
> 
> https://docs.kernel.org/process/license-rules.html
> 
> C style comment ( /* */ ) is the correct syntax for header files. It is 
> unclear to me that the marking needs to be its own comment, instead of 
> included in the body of another comment. I would say that it is typical to 
> have license markings and copyright markings in the same comment block.
> 
> Do you have a reference you can point me to that would clarify this? Perhaps 
> a different file in Documentation or another email thread?

This is a little bit pedantic but the docs you pointed to and every other .h 
file in the driver seem to close the comment on the same line.

>>> + *
>>> + * Copyright (c) 2020, The Linux Foundation. All rights reserved.
>>
>> 2025?
> 
> No, per the above reasons.
> 
>>> + */
>>> +
>>> +#ifndef __QAIC_RAS_H__
>>> +#define __QAIC_RAS_H__
>>> +
>>> +int qaic_ras_register(void);
>>> +void qaic_ras_unregister(void);
>>
>> new line?
> 
> Ok.
> 
>>> +#endif /* __QAIC_RAS_H__ */
>>
> 

Reply via email to