On 5/13/2025 10:13 AM, Jacek Lawrynowicz wrote:
Hi,
On 5/13/2025 5:05 PM, Jeff Hugo wrote:
On 5/13/2025 3:53 AM, Jacek Lawrynowicz wrote:
Hi,
On 5/12/2025 9:49 PM, Jeff Hugo wrote:
diff --git a/drivers/accel/qaic/qaic_ras.c b/drivers/accel/qaic/qaic_ras.c
new file mode 100644
index 000000000000..2f8c1f08dbc0
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ras.c
@@ -0,0 +1,629 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2022-2024 Qualcomm Innovation Center, Inc. All rights
reserved. */
2025?
No, we haven't made any changes to this file this year, so our policy would be
to omit 2025. Historically we've experienced that the community wants the file
markings like this to track any copyright prior to the inclusion of the code
into upstream/mainline, and then to rely on git metadata to track copyright
after inclusion. Therefore that is the policy we follow. However, these
markings might be changing based the rest of your feedback.
+struct ras_data {
+ /* header start */
+ /* Magic number to validate the message */
+ u16 magic;
+ /* RAS version number */
+ u16 ver;
+ u32 seq_num;
+ /* RAS message type */
+ u8 type;
+ u8 id;
+ /* Size of RAS message without the header in byte */
+ u16 len;
+ /* header end */
+ s32 result;
+ /*
+ * Error source
+ * 0 : SoC Memory
+ * 1 : PCIE
+ * 2 : DDR
+ * 3 : System Bus source 1
+ * 4 : System Bus source 2
+ * 5 : NSP Memory
+ * 6 : Temperature Sensors
+ */
+ u32 source;
+ /*
+ * Stores the error type, there are three types of error in RAS
+ * 0 : correctable error (CE)
+ * 1 : uncorrectable error (UE)
+ * 2 : uncorrectable error that is non-fatal (UE_NF)
+ */
+ u32 err_type;
+ u32 err_threshold;
This is unused. Maybe it could be useful?
The device can be configured to only make a RAS report to the host after a
threshold of events has occured - say every 10 DDR ECC events, report to the
host (qaic driver). This field basically restates what that configured limit
is. I suppose we can include it in the logged reports to signify that this
report really represents N incidents on the device.
+ case PCIE:
+ pci_printk(level, qdev->pdev, "RAS event.\nClass:%s\nDescription:%s %s
%s\n",
+ err_class_str[msg->err_type],
+ err_type_str[msg->err_type],
+ "error from",
+ err_src_str[msg->source]);
+
+ switch (msg->err_type) {
+ case CE:
+ printk(KERN_WARNING pr_fmt("Syndrome:\n Bad TLP count %d\n Bad
DLLP count %d\n Replay Rollover count %d\n Replay Timeout count %d\n Recv Error
count %d\n Internal CE count %d\n"),
+ pcie_syndrome->bad_tlp,
+ pcie_syndrome->bad_dllp,
+ pcie_syndrome->replay_rollover,
+ pcie_syndrome->replay_timeout,
+ pcie_syndrome->rx_err,
+ pcie_syndrome->internal_ce_count);
Why not pci_printk() that would be conistent with the rest of logging?
It there is a reson I would prefer pr_warn/pr_err style logs.
This is a special case. This is a continuation of the pci_printk() a few lines up. If we
do pci_printk() here, then the entire message gets broken up is a weird way. In the
middle of the report, you'll have the "header" that pci_printk() adds (PCI
device, driver, etc) repeted.
The way to avoid that would be to restructure this bit of the code to have all the
switches/ifs resolved, and have a single pci_printk() for the entire decoded report.
That means we'll have a lot of duplicated code since the common "report header"
for the different PCIe reports would need to be duplicated for each report variant.
This felt like the cleaner solution, although it does have its quirks.
Would a comment help?
Sure, comment would make this more readable. I would still consider using a
single pci_printk() per event anyway because you could get these messages
broken up if there is a lot of dmesg output from other places.
printk() takes a global lock in case there is VGA or serial console connected,
so the messages wouldn't be split but it is up to you if you prefer cleaner
code or cleaner logs.
Just for reference, these reports are similar to AER, and I recall
seeing AER messages split in some instances. I don't recall difficulty
in correlating such logs. I think I will go the cleaner code route.
Luckily this rather internal to the driver so if we end up hating the
logs, we can make adjustments later.
Will add a comment.
+static void qaic_ras_mhi_dl_xfer_cb(struct mhi_device *mhi_dev, struct
mhi_result *mhi_result)
+{
+ struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+ struct ras_data *msg = mhi_result->buf_addr;
+ int ret;
+
+ if (mhi_result->transaction_status) {
+ kfree(msg);
+ return;
+ }
+
+ ras_msg_to_cpu(msg);
+ decode_ras_msg(qdev, msg);
+
+ ret = mhi_queue_buf(qdev->ras_ch, DMA_FROM_DEVICE, msg, sizeof(*msg),
MHI_EOT);
+ if (ret) {
+ dev_err(&mhi_dev->dev, "Cannot requeue RAS recv buf %d\n", ret);
+ kfree(msg);
Woudn't error here prevent any future messages from being received?
Sadly, yes. This should only happen if there is some issue with the underlying
PCIe link.
diff --git a/drivers/accel/qaic/qaic_ras.h b/drivers/accel/qaic/qaic_ras.h
new file mode 100644
index 000000000000..5df6cb9dae80
--- /dev/null
+++ b/drivers/accel/qaic/qaic_ras.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only
Should be:
/* SPDX-License-Identifier: GPL-2.0-only */
or
// SPDX-License-Identifier: GPL-2.0-only
The "//" syntax is for C source files (foo.c) and this is a header file, so I
don't think that suggestion applies.
https://docs.kernel.org/process/license-rules.html
C style comment ( /* */ ) is the correct syntax for header files. It is unclear
to me that the marking needs to be its own comment, instead of included in the
body of another comment. I would say that it is typical to have license
markings and copyright markings in the same comment block.
Do you have a reference you can point me to that would clarify this? Perhaps a
different file in Documentation or another email thread?
This is a little bit pedantic but the docs you pointed to and every other .h
file in the driver seem to close the comment on the same line.
I guess I interpret the doc is being specific to SPDX, therefore its
outside of scope to bring copyright into the documentation.
Looking at the qaic driver, it appears we've made a bit of a mess. This
patch would align with-
mhi_controller.h
qaic.h
qaic_timesync.h
However, sahara.h and qaic_debugfs.h appear to follow what you are
recommending here.
I think we'll update this patch per your suggestion, and do a cleanup to
the driver to pick a style.
+ *
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.
2025?
No, per the above reasons.
+ */
+
+#ifndef __QAIC_RAS_H__
+#define __QAIC_RAS_H__
+
+int qaic_ras_register(void);
+void qaic_ras_unregister(void);
new line?
Ok.
+#endif /* __QAIC_RAS_H__ */