On 5/13/2025 10:13 AM, Jacek Lawrynowicz wrote:
Hi,

On 5/13/2025 5:05 PM, Jeff Hugo wrote:
On 5/13/2025 3:53 AM, Jacek Lawrynowicz wrote:
Hi,

On 5/12/2025 9:49 PM, Jeff Hugo wrote:
diff --git a/drivers/accel/qaic/qaic_ras.c b/drivers/accel/qaic/qaic_ras.c
new file mode 100644
index 000000000000..2f8c1f08dbc0
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ras.c
@@ -0,0 +1,629 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights 
reserved. */

2025?

No, we haven't made any changes to this file this year, so our policy would be 
to omit 2025.  Historically we've experienced that the community wants the file 
markings like this to track any copyright prior to the inclusion of the code 
into upstream/mainline, and then to rely on git metadata to track copyright 
after inclusion. Therefore that is the policy we follow.  However, these 
markings might be changing based the rest of your feedback.

+struct ras_data {
+    /* header start */
+    /* Magic number to validate the message */
+    u16 magic;
+    /* RAS version number */
+    u16 ver;
+    u32 seq_num;
+    /* RAS message type */
+    u8  type;
+    u8  id;
+    /* Size of RAS message without the header in byte */
+    u16 len;
+    /* header end */
+    s32 result;
+    /*
+     * Error source
+     * 0 : SoC Memory
+     * 1 : PCIE
+     * 2 : DDR
+     * 3 : System Bus source 1
+     * 4 : System Bus source 2
+     * 5 : NSP Memory
+     * 6 : Temperature Sensors
+     */
+    u32 source;
+    /*
+     * Stores the error type, there are three types of error in RAS
+     * 0 : correctable error (CE)
+     * 1 : uncorrectable error (UE)
+     * 2 : uncorrectable error that is non-fatal (UE_NF)
+     */
+    u32 err_type;
+    u32 err_threshold;

This is unused. Maybe it could be useful?

The device can be configured to only make a RAS report to the host after a 
threshold of events has occured - say every 10 DDR ECC events, report to the 
host (qaic driver). This field basically restates what that configured limit 
is. I suppose we can include it in the logged reports to signify that this 
report really represents N incidents on the device.

+    case PCIE:
+        pci_printk(level, qdev->pdev, "RAS event.\nClass:%s\nDescription:%s %s 
%s\n",
+               err_class_str[msg->err_type],
+               err_type_str[msg->err_type],
+               "error from",
+               err_src_str[msg->source]);
+
+        switch (msg->err_type) {
+        case CE:
+            printk(KERN_WARNING pr_fmt("Syndrome:\n    Bad TLP count %d\n    Bad 
DLLP count %d\n    Replay Rollover count %d\n    Replay Timeout count %d\n    Recv Error 
count %d\n    Internal CE count %d\n"),
+                   pcie_syndrome->bad_tlp,
+                   pcie_syndrome->bad_dllp,
+                   pcie_syndrome->replay_rollover,
+                   pcie_syndrome->replay_timeout,
+                   pcie_syndrome->rx_err,
+                   pcie_syndrome->internal_ce_count);

Why not pci_printk() that would be conistent with the rest of logging?
It there is a reson I would prefer pr_warn/pr_err style logs.

This is a special case. This is a continuation of the pci_printk() a few lines up. If we 
do pci_printk() here, then the entire message gets broken up is a weird way. In the 
middle of the report, you'll have the "header" that pci_printk() adds (PCI 
device, driver, etc) repeted.

The way to avoid that would be to restructure this bit of the code to have all the 
switches/ifs resolved, and have a single pci_printk() for the entire decoded report.  
That means we'll have a lot of duplicated code since the common "report header" 
for the different PCIe reports would need to be duplicated for each report variant.

This felt like the cleaner solution, although it does have its quirks.

Would a comment help?

Sure, comment would make this more readable. I would still consider using a 
single pci_printk() per event anyway because you could get these messages 
broken up if there is a lot of dmesg output from other places.
printk() takes a global lock in case there is VGA or serial console connected, 
so the messages wouldn't be split but it is up to you if you prefer cleaner 
code or cleaner logs.

Just for reference, these reports are similar to AER, and I recall seeing AER messages split in some instances. I don't recall difficulty in correlating such logs. I think I will go the cleaner code route. Luckily this rather internal to the driver so if we end up hating the logs, we can make adjustments later.

Will add a comment.


+static void qaic_ras_mhi_dl_xfer_cb(struct mhi_device *mhi_dev, struct 
mhi_result *mhi_result)
+{
+    struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+    struct ras_data *msg = mhi_result->buf_addr;
+    int ret;
+
+    if (mhi_result->transaction_status) {
+        kfree(msg);
+        return;
+    }
+
+    ras_msg_to_cpu(msg);
+    decode_ras_msg(qdev, msg);
+
+    ret = mhi_queue_buf(qdev->ras_ch, DMA_FROM_DEVICE, msg, sizeof(*msg), 
MHI_EOT);
+    if (ret) {
+        dev_err(&mhi_dev->dev, "Cannot requeue RAS recv buf %d\n", ret);
+        kfree(msg);

Woudn't error here prevent any future messages from being received?

Sadly, yes. This should only happen if there is some issue with the underlying 
PCIe link.

diff --git a/drivers/accel/qaic/qaic_ras.h b/drivers/accel/qaic/qaic_ras.h
new file mode 100644
index 000000000000..5df6cb9dae80
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ras.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only

Should be:
/* SPDX-License-Identifier: GPL-2.0-only */
or
// SPDX-License-Identifier: GPL-2.0-only

The "//" syntax is for C source files (foo.c) and this is a header file, so I 
don't think that suggestion applies.

https://docs.kernel.org/process/license-rules.html

C style comment ( /* */ ) is the correct syntax for header files. It is unclear 
to me that the marking needs to be its own comment, instead of included in the 
body of another comment. I would say that it is typical to have license 
markings and copyright markings in the same comment block.

Do you have a reference you can point me to that would clarify this? Perhaps a 
different file in Documentation or another email thread?

This is a little bit pedantic but the docs you pointed to and every other .h 
file in the driver seem to close the comment on the same line.

I guess I interpret the doc is being specific to SPDX, therefore its outside of scope to bring copyright into the documentation.

Looking at the qaic driver, it appears we've made a bit of a mess. This patch would align with-
mhi_controller.h
qaic.h
qaic_timesync.h

However, sahara.h and qaic_debugfs.h appear to follow what you are recommending here.

I think we'll update this patch per your suggestion, and do a cleanup to the driver to pick a style.

+ *
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.

2025?

No, per the above reasons.

+ */
+
+#ifndef __QAIC_RAS_H__
+#define __QAIC_RAS_H__
+
+int qaic_ras_register(void);
+void qaic_ras_unregister(void);

new line?

Ok.

+#endif /* __QAIC_RAS_H__ */




Reply via email to