Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Christian König

Am 20.04.21 um 09:46 schrieb Michal Hocko:

On Tue 20-04-21 09:32:14, Christian König wrote:

Am 20.04.21 um 09:04 schrieb Michal Hocko:

On Mon 19-04-21 18:37:13, Christian König wrote:

Am 19.04.21 um 18:11 schrieb Michal Hocko:

[...]

What I am trying to bring up with NUMA side is that the same problem can
happen on per-node basis. Let's say that some user consumes unexpectedly
large amount of dma-buf on a certain node. This can lead to observable
performance impact on anybody on allocating from that node and even
worse cause an OOM for node bound consumers. How do I find out that it
was dma-buf that has caused the problem?

Yes, that is the direction my thinking goes as well, but also even further.

See DMA-buf is also used to share device local memory between processes as
well. In other words VRAM on graphics hardware.

On my test system here I have 32GB of system memory and 16GB of VRAM. I can
use DMA-buf to allocate that 16GB of VRAM quite easily which then shows up
under /proc/meminfo as used memory.

This is something that would be really interesting in the changelog. I
mean the expected and extreme memory consumption of this memory. Ideally
with some hints on what to do when the number is really high (e.g. mount
debugfs and have a look here and there to check whether this is just too
many users or an unexpected pattern to be reported).


But that isn't really system memory at all, it's just allocated device
memory.

OK, that was not really clear to me. So this is not really accounted to
MemTotal?


It depends. In a lot of embedded systems you only have system memory and 
in this case that value here is indeed really useful.



If that is really the case then reporting it into the oom
report is completely pointless and I am not even sure /proc/meminfo is
the right interface either. It would just add more confusion I am
afraid.


I kind of agree. As I said a DMA-buf could be backed by system memory or 
device memory.


In the case when it is backed by system memory it does make sense to 
report this in an OOM dump.


But only the exporting driver knows what the DMA-buf handle represents, 
the framework just provides the common ground for inter driver 
communication.



See where I am heading?

Yeah, totally. Thanks for pointing this out.

Suggestions how to handle that?

As I've pointed out in previous reply we do have an API to account per
node memory but now that you have brought up that this is not something
we account as a regular memory then this doesn't really fit into that
model. But maybe I am just confused.


Well does that API also has a counter for memory used by device drivers?

If yes then the device driver who exported the DMA-buf should probably 
use that API. If no we might want to create one.


I mean the author of this patch seems to have an use case where this is 
needed and I also see that we have some hole in how we account memory.


Christian.


Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-20 Thread Christian König

Am 20.04.21 um 09:04 schrieb Michal Hocko:

On Mon 19-04-21 18:37:13, Christian König wrote:

Am 19.04.21 um 18:11 schrieb Michal Hocko:

[...]

The question is not whether it is NUMA aware but whether it is useful to
know per-numa data for the purpose the counter is supposed to serve.

No, not at all. The pages of a single DMA-buf could even be from different
NUMA nodes if the exporting driver decides that this is somehow useful.

As the use of the counter hasn't been explained yet I can only
speculate. One thing that I can imagine to be useful is to fill gaps in
our accounting. It is quite often that the memroy accounted in
/proc/meminfo (or oom report) doesn't add up to the overall memory
usage. In some workloads the workload can be huge! In many cases there
are other means to find out additional memory by a subsystem specific
interfaces (e.g. networking buffers). I do assume that dma-buf is just
one of those and the counter can fill the said gap at least partially
for some workloads. That is definitely useful.


Yes, completely agree. I'm just not 100% sure if the DMA-buf framework 
should account for that or the individual drivers exporting DMA-bufs.


See below for a further explanation.


What I am trying to bring up with NUMA side is that the same problem can
happen on per-node basis. Let's say that some user consumes unexpectedly
large amount of dma-buf on a certain node. This can lead to observable
performance impact on anybody on allocating from that node and even
worse cause an OOM for node bound consumers. How do I find out that it
was dma-buf that has caused the problem?


Yes, that is the direction my thinking goes as well, but also even further.

See DMA-buf is also used to share device local memory between processes 
as well. In other words VRAM on graphics hardware.


On my test system here I have 32GB of system memory and 16GB of VRAM. I 
can use DMA-buf to allocate that 16GB of VRAM quite easily which then 
shows up under /proc/meminfo as used memory.


But that isn't really system memory at all, it's just allocated device 
memory.



See where I am heading?


Yeah, totally. Thanks for pointing this out.

Suggestions how to handle that?

Regards,
Christian.


Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-19 Thread Christian König




Am 19.04.21 um 18:11 schrieb Michal Hocko:

On Mon 19-04-21 17:44:13, Christian König wrote:

Am 19.04.21 um 17:19 schrieb peter.enderb...@sony.com:

On 4/19/21 5:00 PM, Michal Hocko wrote:

On Mon 19-04-21 12:41:58, peter.enderb...@sony.com wrote:

On 4/19/21 2:16 PM, Michal Hocko wrote:

On Sat 17-04-21 12:40:32, Peter Enderborg wrote:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

The changelog would benefit from more background on why this is needed,
and who is the primary consumer of that value.

I cannot really comment on the dma-buf internals but I have two remarks.
Documentation/filesystems/proc.rst needs an update with the counter
explanation and secondly is this information useful for OOM situations
analysis? If yes then show_mem should dump the value as well.

  From the implementation point of view, is there any reason why this
hasn't used the existing global_node_page_state infrastructure?

I fix doc in next version.  Im not sure what you expect the commit message to 
include.

As I've said. Usual justification covers answers to following questions
- Why do we need it?
- Why the existing data is insuficient?
- Who is supposed to use the data and for what?

I can see an answer for the first two questions (because this can be a
lot of memory and the existing infrastructure is not production suitable
- debugfs). But the changelog doesn't really explain who is going to use
the new data. Is this a monitoring to raise an early alarm when the
value grows? Is this for debugging misbehaving drivers? How is it
valuable for those?


The function of the meminfo is: (From Documentation/filesystems/proc.rst)

"Provides information about distribution and utilization of memory."

True. Yet we do not export any random counters, do we?


Im not the designed of dma-buf, I think  global_node_page_state as a kernel
internal.

It provides a node specific and optimized counters. Is this a good fit
with your new counter? Or the NUMA locality is of no importance?

Sounds good to me, if Christian Koenig think it is good, I will use that.
It is only virtio in drivers that use the global_node_page_state if
that matters.

DMA-buf are not NUMA aware at all. On which node the pages are allocated
(and if we use pages at all and not internal device memory) is up to the
exporter and importer.

The question is not whether it is NUMA aware but whether it is useful to
know per-numa data for the purpose the counter is supposed to serve.


No, not at all. The pages of a single DMA-buf could even be from 
different NUMA nodes if the exporting driver decides that this is 
somehow useful.


Christian.


Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-19 Thread Christian König

Am 19.04.21 um 17:19 schrieb peter.enderb...@sony.com:

On 4/19/21 5:00 PM, Michal Hocko wrote:

On Mon 19-04-21 12:41:58, peter.enderb...@sony.com wrote:

On 4/19/21 2:16 PM, Michal Hocko wrote:

On Sat 17-04-21 12:40:32, Peter Enderborg wrote:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

The changelog would benefit from more background on why this is needed,
and who is the primary consumer of that value.

I cannot really comment on the dma-buf internals but I have two remarks.
Documentation/filesystems/proc.rst needs an update with the counter
explanation and secondly is this information useful for OOM situations
analysis? If yes then show_mem should dump the value as well.

 From the implementation point of view, is there any reason why this
hasn't used the existing global_node_page_state infrastructure?

I fix doc in next version.  Im not sure what you expect the commit message to 
include.

As I've said. Usual justification covers answers to following questions
- Why do we need it?
- Why the existing data is insuficient?
- Who is supposed to use the data and for what?

I can see an answer for the first two questions (because this can be a
lot of memory and the existing infrastructure is not production suitable
- debugfs). But the changelog doesn't really explain who is going to use
the new data. Is this a monitoring to raise an early alarm when the
value grows? Is this for debugging misbehaving drivers? How is it
valuable for those?


The function of the meminfo is: (From Documentation/filesystems/proc.rst)

"Provides information about distribution and utilization of memory."

True. Yet we do not export any random counters, do we?


Im not the designed of dma-buf, I think  global_node_page_state as a kernel
internal.

It provides a node specific and optimized counters. Is this a good fit
with your new counter? Or the NUMA locality is of no importance?

Sounds good to me, if Christian Koenig think it is good, I will use that.
It is only virtio in drivers that use the global_node_page_state if
that matters.


DMA-buf are not NUMA aware at all. On which node the pages are allocated 
(and if we use pages at all and not internal device memory) is up to the 
exporter and importer.


Christian.





dma-buf is a device driver that provides a function so I might be
on the outside. However I also see that it might be relevant for a OOM.
It is memory that can be freed by killing userspace processes.

The show_mem thing. Should it be a separate patch?

This is up to you but if you want to expose the counter then send it in
one series.





Re: [External] [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-17 Thread Christian König

Am 17.04.21 um 16:21 schrieb Muchun Song:

On Sat, Apr 17, 2021 at 9:44 PM  wrote:

On 4/17/21 3:07 PM, Muchun Song wrote:

On Sat, Apr 17, 2021 at 6:41 PM Peter Enderborg
 wrote:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

Signed-off-by: Peter Enderborg 
---
  drivers/dma-buf/dma-buf.c | 13 +
  fs/proc/meminfo.c |  5 -
  include/linux/dma-buf.h   |  1 +
  3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..197e5c45dd26 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
  };

  static struct dma_buf_list db_list;
+static atomic_long_t dma_buf_global_allocated;

  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
  {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
 if (dmabuf->resv == (struct dma_resv *)[1])
 dma_resv_fini(dmabuf->resv);

+   atomic_long_sub(dmabuf->size, _buf_global_allocated);
 module_put(dmabuf->owner);
 kfree(dmabuf->name);
 kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
 mutex_lock(_list.lock);
 list_add(>list_node, _list.head);
 mutex_unlock(_list.lock);
+   atomic_long_add(dmabuf->size, _buf_global_allocated);

 return dmabuf;

@@ -1346,6 +1349,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
dma_buf_map *map)
  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);

+/**
+ * dma_buf_allocated_pages - Return the used nr of pages
+ * allocated for dma-buf
+ */
+long dma_buf_allocated_pages(void)
+{
+   return atomic_long_read(_buf_global_allocated) >> PAGE_SHIFT;
+}
+EXPORT_SYMBOL_GPL(dma_buf_allocated_pages);

dma_buf_allocated_pages is only called from fs/proc/meminfo.c.
I am confused why it should be exported. If it won't be called
from the driver module, we should not export it.

Ah. I thought you did not want the GPL restriction. I don't have real
opinion about it. It's written to be following the rest of the module.
It is not needed for the usage of dma-buf in kernel module. But I
don't see any reason for hiding it either.

The modules do not need dma_buf_allocated_pages, hiding it
can prevent the module from calling it. So I think that
EXPORT_SYMBOL_GPL is unnecessary. If one day someone
want to call it from the module, maybe it’s not too late to export
it at that time.


Yeah, that is a rather good point. Only symbols which should be used by 
modules/drivers should be exported.


Christian.






Thanks.


+
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..ccc7c40c8db7 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
  #ifdef CONFIG_CMA
  #include 
  #endif
+#include 
  #include 
  #include "internal.h"

@@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 show_val_kb(m, "CmaFree:",
 global_zone_page_state(NR_FREE_CMA_PAGES));
  #endif
-
+#ifdef CONFIG_DMA_SHARED_BUFFER
+   show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
+#endif
 hugetlb_report_meminfo(m);

 arch_report_meminfo(m);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..5b05816bd2cd 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
  unsigned long);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_allocated_pages(void);
  #endif /* __DMA_BUF_H__ */
--
2.17.1





Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-17 Thread Christian König

Am 17.04.21 um 13:20 schrieb peter.enderb...@sony.com:

On 4/17/21 12:59 PM, Christian König wrote:

Am 17.04.21 um 12:40 schrieb Peter Enderborg:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

Signed-off-by: Peter Enderborg 

Reviewed-by: Christian König 

How do you want to upstream this?

I don't understand that question. The patch applies on Torvalds 5.12-rc7,
but I guess 5.13 is what we work on right now.


Yeah, but how do you want to get it into Linus tree?

I can push it together with other DMA-buf patches through drm-misc-next 
and then Dave will send it to Linus for inclusion in 5.13.


But could be that you are pushing multiple changes towards Linus through 
some other branch. In this case I'm fine if you pick that way instead if 
you want to keep your patches together for some reason.


Christian.




---
   drivers/dma-buf/dma-buf.c | 13 +
   fs/proc/meminfo.c |  5 -
   include/linux/dma-buf.h   |  1 +
   3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..197e5c45dd26 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
   };
     static struct dma_buf_list db_list;
+static atomic_long_t dma_buf_global_allocated;
     static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int 
buflen)
   {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
   if (dmabuf->resv == (struct dma_resv *)[1])
   dma_resv_fini(dmabuf->resv);
   +    atomic_long_sub(dmabuf->size, _buf_global_allocated);
   module_put(dmabuf->owner);
   kfree(dmabuf->name);
   kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
   mutex_lock(_list.lock);
   list_add(>list_node, _list.head);
   mutex_unlock(_list.lock);
+    atomic_long_add(dmabuf->size, _buf_global_allocated);
     return dmabuf;
   @@ -1346,6 +1349,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
dma_buf_map *map)
   }
   EXPORT_SYMBOL_GPL(dma_buf_vunmap);
   +/**
+ * dma_buf_allocated_pages - Return the used nr of pages
+ * allocated for dma-buf
+ */
+long dma_buf_allocated_pages(void)
+{
+    return atomic_long_read(_buf_global_allocated) >> PAGE_SHIFT;
+}
+EXPORT_SYMBOL_GPL(dma_buf_allocated_pages);
+
   #ifdef CONFIG_DEBUG_FS
   static int dma_buf_debug_show(struct seq_file *s, void *unused)
   {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..ccc7c40c8db7 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
   #ifdef CONFIG_CMA
   #include 
   #endif
+#include 
   #include 
   #include "internal.h"
   @@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
   show_val_kb(m, "CmaFree:    ",
   global_zone_page_state(NR_FREE_CMA_PAGES));
   #endif
-
+#ifdef CONFIG_DMA_SHARED_BUFFER
+    show_val_kb(m, "DmaBufTotal:    ", dma_buf_allocated_pages());
+#endif
   hugetlb_report_meminfo(m);
     arch_report_meminfo(m);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..5b05816bd2cd 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
    unsigned long);
   int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
   void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_allocated_pages(void);
   #endif /* __DMA_BUF_H__ */




Re: [PATCH v4] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-17 Thread Christian König

Am 17.04.21 um 12:40 schrieb Peter Enderborg:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

Signed-off-by: Peter Enderborg 


Reviewed-by: Christian König 

How do you want to upstream this?


---
  drivers/dma-buf/dma-buf.c | 13 +
  fs/proc/meminfo.c |  5 -
  include/linux/dma-buf.h   |  1 +
  3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..197e5c45dd26 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
  };
  
  static struct dma_buf_list db_list;

+static atomic_long_t dma_buf_global_allocated;
  
  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)

  {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)[1])
dma_resv_fini(dmabuf->resv);
  
+	atomic_long_sub(dmabuf->size, _buf_global_allocated);

module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
mutex_lock(_list.lock);
list_add(>list_node, _list.head);
mutex_unlock(_list.lock);
+   atomic_long_add(dmabuf->size, _buf_global_allocated);
  
  	return dmabuf;
  
@@ -1346,6 +1349,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)

  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
  
+/**

+ * dma_buf_allocated_pages - Return the used nr of pages
+ * allocated for dma-buf
+ */
+long dma_buf_allocated_pages(void)
+{
+   return atomic_long_read(_buf_global_allocated) >> PAGE_SHIFT;
+}
+EXPORT_SYMBOL_GPL(dma_buf_allocated_pages);
+
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..ccc7c40c8db7 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
  #ifdef CONFIG_CMA
  #include 
  #endif
+#include 
  #include 
  #include "internal.h"
  
@@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)

show_val_kb(m, "CmaFree:",
global_zone_page_state(NR_FREE_CMA_PAGES));
  #endif
-
+#ifdef CONFIG_DMA_SHARED_BUFFER
+   show_val_kb(m, "DmaBufTotal:", dma_buf_allocated_pages());
+#endif
hugetlb_report_meminfo(m);
  
  	arch_report_meminfo(m);

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..5b05816bd2cd 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_allocated_pages(void);
  #endif /* __DMA_BUF_H__ */




Re: [PATCH 35/40] drm/amd/amdgpu/amdgpu_cs: Repair some function naming disparity

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: expecting prototype for 
cs_parser_fini(). Prototype was for amdgpu_cs_parser_fini() instead
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1502: warning: expecting prototype for 
amdgpu_cs_wait_all_fence(). Prototype was for amdgpu_cs_wait_all_fences() 
instead
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1656: warning: expecting prototype for 
amdgpu_cs_find_bo_va(). Prototype was for amdgpu_cs_find_mapping() instead

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Jerome Glisse 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b5c7669980458..90136f9dedd65 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -672,7 +672,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
  }
  
  /**

- * cs_parser_fini() - clean parser states
+ * amdgpu_cs_parser_fini() - clean parser states
   * @parser:   parser structure holding parsing context.
   * @error:error number
   * @backoff:  indicator to backoff the reservation
@@ -1488,7 +1488,7 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device 
*dev, void *data,
  }
  
  /**

- * amdgpu_cs_wait_all_fence - wait on all fences to signal
+ * amdgpu_cs_wait_all_fences - wait on all fences to signal
   *
   * @adev: amdgpu device
   * @filp: file private
@@ -1639,7 +1639,7 @@ int amdgpu_cs_wait_fences_ioctl(struct drm_device *dev, 
void *data,
  }
  
  /**

- * amdgpu_cs_find_bo_va - find bo_va for VM address
+ * amdgpu_cs_find_mapping - find bo_va for VM address
   *
   * @parser: command submission parser context
   * @addr: VM address




Re: [PATCH 33/40] drm/amd/amdgpu/amdgpu_ring: Provide description for 'sched_score'

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter or 
member 'sched_score' not described in 'amdgpu_ring_init'

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 688624ebe4211..7b634a1517f9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -158,6 +158,7 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
   * @irq_src: interrupt source to use for this ring
   * @irq_type: interrupt type to use for this ring
   * @hw_prio: ring priority (NORMAL/HIGH)
+ * @sched_score: optional score atomic shared with other schedulers
   *
   * Initialize the driver information for the selected ring (all asics).
   * Returns 0 on success, error on failure.




Re: [PATCH 32/40] drm/amd/amdgpu/amdgpu_ttm: Fix incorrectly documented function 'amdgpu_ttm_copy_mem_to_mem()'

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:311: warning: expecting prototype for 
amdgpu_copy_ttm_mem_to_mem(). Prototype was for amdgpu_ttm_copy_mem_to_mem() 
instead

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Jerome Glisse 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 3bef0432cac2f..859314c0d6a39 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -288,7 +288,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object 
*bo,
  }
  
  /**

- * amdgpu_copy_ttm_mem_to_mem - Helper function for copy
+ * amdgpu_ttm_copy_mem_to_mem - Helper function for copy
   * @adev: amdgpu device
   * @src: buffer/address where to read from
   * @dst: buffer/address where to write to




Re: [PATCH 31/40] drm/amd/amdgpu/amdgpu_gart: Correct a couple of function names in the docs

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:73: warning: expecting prototype for 
amdgpu_dummy_page_init(). Prototype was for amdgpu_gart_dummy_page_init() 
instead
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:96: warning: expecting prototype for 
amdgpu_dummy_page_fini(). Prototype was for amdgpu_gart_dummy_page_fini() 
instead

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Nirmoy Das 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index c5a9a4fb10d2b..5562b5c90c032 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -60,7 +60,7 @@
   */
  
  /**

- * amdgpu_dummy_page_init - init dummy page used by the driver
+ * amdgpu_gart_dummy_page_init - init dummy page used by the driver
   *
   * @adev: amdgpu_device pointer
   *
@@ -86,7 +86,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device 
*adev)
  }
  
  /**

- * amdgpu_dummy_page_fini - free dummy page used by the driver
+ * amdgpu_gart_dummy_page_fini - free dummy page used by the driver
   *
   * @adev: amdgpu_device pointer
   *




Re: [PATCH 29/40] drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score'

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function parameter or 
member 'sched_score' not described in 'amdgpu_fence_driver_init_ring'

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Jerome Glisse 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 47ea468596184..30772608eac6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -434,6 +434,7 @@ int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
   *
   * @ring: ring to init the fence driver on
   * @num_hw_submission: number of entries on the hardware queue
+ * @sched_score: optional score atomic shared with other schedulers
   *
   * Init the fence driver for the requested ring (all asics).
   * Helper function for amdgpu_fence_driver_init().




Re: [PATCH 25/40] drm/radeon/radeon_device: Provide function name in kernel-doc header

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/radeon/radeon_device.c:1101: warning: This comment starts 
with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst

Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/radeon_device.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index cc445c4cba2e3..46eea01950cb1 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1098,7 +1098,8 @@ static bool radeon_check_pot_argument(int arg)
  }
  
  /**

- * Determine a sensible default GART size according to ASIC family.
+ * radeon_gart_size_auto - Determine a sensible default GART size
+ * according to ASIC family.
   *
   * @family: ASIC family name
   */




Re: [PATCH 22/40] drm/ttm/ttm_tt: Demote non-conformant kernel-doc header

2021-04-16 Thread Christian König




Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/ttm/ttm_tt.c:398: warning: Function parameter or member 
'num_pages' not described in 'ttm_tt_mgr_init'
  drivers/gpu/drm/ttm/ttm_tt.c:398: warning: Function parameter or member 
'num_dma32_pages' not described in 'ttm_tt_mgr_init'

Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 


For that one I would rather prefer to just document the two parameters.

Christian.


---
  drivers/gpu/drm/ttm/ttm_tt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 7dcd3fb694956..d939c3bde2fcf 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -389,7 +389,7 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct 
ttm_tt *ttm)
ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
  }
  
-/**

+/*
   * ttm_tt_mgr_init - register with the MM shrinker
   *
   * Register with the MM shrinker for swapping out BOs.




Re: [PATCH 27/40] drm/ttm/ttm_device: Demote kernel-doc abuses

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/ttm/ttm_device.c:42: warning: Function parameter or member 
'ttm_global_mutex' not described in 'DEFINE_MUTEX'
  drivers/gpu/drm/ttm/ttm_device.c:42: warning: expecting prototype for 
ttm_global_mutex(). Prototype was for DEFINE_MUTEX() instead
  drivers/gpu/drm/ttm/ttm_device.c:112: warning: Function parameter or member 
'ctx' not described in 'ttm_global_swapout'
  drivers/gpu/drm/ttm/ttm_device.c:112: warning: Function parameter or member 
'gfp_flags' not described in 'ttm_global_swapout'
  drivers/gpu/drm/ttm/ttm_device.c:112: warning: expecting prototype for A 
buffer object shrink method that tries to swap out the first(). Prototype was 
for ttm_global_swapout() instead

Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/ttm/ttm_device.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 9b787b3caeb50..a8bec8358350d 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -36,7 +36,7 @@
  
  #include "ttm_module.h"
  
-/**

+/*
   * ttm_global_mutex - protecting the global state
   */
  DEFINE_MUTEX(ttm_global_mutex);
@@ -104,7 +104,7 @@ static int ttm_global_init(void)
return ret;
  }
  
-/**

+/*
   * A buffer object shrink method that tries to swap out the first
   * buffer object on the global::swap_lru list.
   */




Re: [PATCH 23/40] drm/ttm/ttm_bo: Fix incorrectly documented function 'ttm_bo_cleanup_refs'

2021-04-16 Thread Christian König

Am 16.04.21 um 16:37 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/ttm/ttm_bo.c:293: warning: expecting prototype for function 
ttm_bo_cleanup_refs(). Prototype was for ttm_bo_cleanup_refs() instead

Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Lee Jones 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index cfd0b92923973..defec9487e1de 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -274,7 +274,7 @@ static void ttm_bo_flush_all_fences(struct 
ttm_buffer_object *bo)
  }
  
  /**

- * function ttm_bo_cleanup_refs
+ * ttm_bo_cleanup_refs
   * If bo idle, remove from lru lists, and unref.
   * If not idle, block if possible.
   *




Re: [PATCH v2] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-16 Thread Christian König




Am 16.04.21 um 14:33 schrieb Peter Enderborg:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available. dma-buf are indirect allocated by
userspace. So with this value we can monitor and detect
userspace applications that have problems.

Signed-off-by: Peter Enderborg 
---
  drivers/dma-buf/dma-buf.c | 12 
  fs/proc/meminfo.c |  5 -
  include/linux/dma-buf.h   |  1 +
  3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..9f88171b394c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
  };
  
  static struct dma_buf_list db_list;

+static atomic_long_t dma_buf_size;


Probably better to call this and the get function something like 
global_allocated.


Christian.

  
  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)

  {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)[1])
dma_resv_fini(dmabuf->resv);
  
+	atomic_long_sub(dmabuf->size, _buf_size);

module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
mutex_lock(_list.lock);
list_add(>list_node, _list.head);
mutex_unlock(_list.lock);
+   atomic_long_add(dmabuf->size, _buf_size);
  
  	return dmabuf;
  
@@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)

  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
  
+/**

+ * dma_buf_get_size - Return the used nr pages by dma-buf
+ */
+long dma_buf_get_size(void)
+{
+   return atomic_long_read(_buf_size) >> PAGE_SHIFT;
+}
+EXPORT_SYMBOL_GPL(dma_buf_get_size);
+
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..178f6ffb1618 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
  #ifdef CONFIG_CMA
  #include 
  #endif
+#include 
  #include 
  #include "internal.h"
  
@@ -145,7 +146,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)

show_val_kb(m, "CmaFree:",
global_zone_page_state(NR_FREE_CMA_PAGES));
  #endif
-
+#ifdef CONFIG_DMA_SHARED_BUFFER
+   show_val_kb(m, "DmaBufTotal:", dma_buf_get_size());
+#endif
hugetlb_report_meminfo(m);
  
  	arch_report_meminfo(m);

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index efdc56b9d95f..f6481315a377 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_get_size(void);
  #endif /* __DMA_BUF_H__ */




Re: [PATCH] dma-buf: Add DmaBufTotal counter in meminfo

2021-04-16 Thread Christian König

Am 16.04.21 um 11:37 schrieb Peter Enderborg:

This adds a total used dma-buf memory. Details
can be found in debugfs, however it is not for everyone
and not always available.


Well you are kind of missing the intention here.

I mean knowing this is certainly useful in some case, but you need to 
describe which cases that are.


Christian.



Signed-off-by: Peter Enderborg 
---
  drivers/dma-buf/dma-buf.c | 12 
  fs/proc/meminfo.c |  2 ++
  include/linux/dma-buf.h   |  1 +
  3 files changed, 15 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..9f88171b394c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,7 @@ struct dma_buf_list {
  };
  
  static struct dma_buf_list db_list;

+static atomic_long_t dma_buf_size;
  
  static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)

  {
@@ -79,6 +80,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)[1])
dma_resv_fini(dmabuf->resv);
  
+	atomic_long_sub(dmabuf->size, _buf_size);

module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
@@ -586,6 +588,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
mutex_lock(_list.lock);
list_add(>list_node, _list.head);
mutex_unlock(_list.lock);
+   atomic_long_add(dmabuf->size, _buf_size);
  
  	return dmabuf;
  
@@ -1346,6 +1349,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map)

  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
  
+/**

+ * dma_buf_get_size - Return the used nr pages by dma-buf
+ */
+long dma_buf_get_size(void)
+{
+   return atomic_long_read(_buf_size) >> PAGE_SHIFT;
+}
+EXPORT_SYMBOL_GPL(dma_buf_get_size);
+
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..3c1a82b51a6f 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -16,6 +16,7 @@
  #ifdef CONFIG_CMA
  #include 
  #endif
+#include 
  #include 
  #include "internal.h"
  
@@ -145,6 +146,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)

show_val_kb(m, "CmaFree:",
global_zone_page_state(NR_FREE_CMA_PAGES));
  #endif
+   show_val_kb(m, "DmaBufTotal:", dma_buf_get_size());
  
  	hugetlb_report_meminfo(m);
  
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h

index efdc56b9d95f..f6481315a377 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -507,4 +507,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 unsigned long);
  int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
  void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map);
+long dma_buf_get_size(void);
  #endif /* __DMA_BUF_H__ */




Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2

2021-04-16 Thread Christian König

Am 15.04.21 um 22:33 schrieb Andrew Morton:

On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" 
 wrote:


@@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
for (j = 0; j < MAX_ORDER; ++j)
ttm_pool_type_fini(>caching[i].orders[j]);
}
+
+   /* We removed the pool types from the LRU, but we need to also make sure
+* that no shrinker is concurrently freeing pages from the pool.
+*/
+   sync_shrinkers();

It isn't immediately clear to me how this works.  ttm_pool_fini() has
already freed all the pages hasn't it?  So why would it care if some
shrinkers are still playing with the pages?


Yes ttm_pool_fini() has freed up all pages which had been in the pool 
when the function was called.


But the problem is it is possible that a parallel running shrinker has 
taken a page from the pool and is in the process of freeing it up.


When I return here the pool structure and especially the device 
structure are freed while the parallel running shrinker is still using them.


I could go for a design where we have one shrinker per device instead, 
but that would put a bit to much pressure on the pool in my opinion.



Or is it the case that ttm_pool_fini() is assuming that there will be
some further action against these pages, which requires that shrinkers
no longer be accessing the pages and which further assumes that future
shrinker invocations will not be able to look up these pages?

IOW, a bit more explanation about the dynamics here would help!


Sorry, I'm not a native speaker of English and sometimes still have a 
hard time explaining things.


Regards,
Christian.


[PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2

2021-04-15 Thread Christian König
Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

v2: Add a comment explaining why we need sync_shrinkers().

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 44 +-
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..955836d569cc 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, 
struct ttm_pool *pool,
spin_lock_init(>lock);
INIT_LIST_HEAD(>pages);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
list_add_tail(>shrinker_list, _list);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
struct page *p;
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
list_del(>shrinker_list);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
while ((p = ttm_pool_type_take(pt)))
ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,19 @@ static struct ttm_pool_type *ttm_pool_select_type(struct 
ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
struct ttm_pool_type *pt;
-   unsigned int num_freed;
struct page *p;
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
pt = list_first_entry(_list, typeof(*pt), shrinker_list);
+   list_move_tail(>shrinker_list, _list);
+   spin_unlock(_lock);
 
p = ttm_pool_type_take(pt);
-   if (p) {
-   ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-   num_freed = 1 << pt->order;
-   } else {
-   num_freed = 0;
-   }
-
-   list_move_tail(>shrinker_list, _list);
-   mutex_unlock(_lock);
+   if (!p)
+   return 0;
 
-   return num_freed;
+   ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+   return 1 << pt->order;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
for (j = 0; j < MAX_ORDER; ++j)
ttm_pool_type_fini(>caching[i].orders[j]);
}
+
+   /* We removed the pool types from the LRU, but we need to also make sure
+* that no shrinker is concurrently freeing pages from the pool.
+*/
+   sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +604,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
 {
ttm_pool_debugfs_header(m);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
seq_puts(m, "wc\t:");
ttm_pool_debugfs_orders(global_write_combined, m);
seq_puts(m, "uc\t:");
@@ -613,7 +613,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
ttm_pool_debugfs_orders(global_dma32_write_combined, m);
seq_puts(m, "uc 32\t:");
ttm_pool_debugfs_orders(global_dma32_uncached, m);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
ttm_pool_debugfs_footer(m);
 
@@ -640,7 +640,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
 
ttm_pool_debugfs_header(m);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
seq_puts(m, "DMA ");
switch (i) {
@@ -656,7 +656,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
}
ttm_pool_debugfs_orders(pool->caching[i].orders, m);
}
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
ttm_pool_debugfs_footer(m);
return 0;
@@ -693,7 +693,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
if (!page_pool_size)
page_pool_size = num_pages;
 
-   mutex_init(_lock);
+   spin_lock_init(_lock);
INIT_LIST_HEAD(_list);
 
for (i = 0; i < MAX_ORDER; ++i) {
-- 
2.25.1



[PATCH 1/2] mm/vmscan: add sync_shrinkers function

2021-04-15 Thread Christian König
To be able to switch to a spinlock and reduce lock contention in the TTM
shrinker we don't want to hold a mutex while unmapping and freeing pages
from the pool.

But then we somehow need to prevent a race between (for example) the shrinker
trying to free pages and hotplug trying to remove the device which those pages
belong to.

Taking and releasing the shrinker semaphore on the write side after
unmapping and freeing all pages should make sure that no shrinker is running in
paralell any more.

Signed-off-by: Christian König 
---
 include/linux/shrinker.h |  1 +
 mm/vmscan.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 0f80123650e2..6b75dc372fce 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -92,4 +92,5 @@ extern void register_shrinker_prepared(struct shrinker 
*shrinker);
 extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
+extern void sync_shrinkers(void);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 562e87cbd7a1..46cd9c215d73 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -408,6 +408,16 @@ void unregister_shrinker(struct shrinker *shrinker)
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
+/**
+ * sync_shrinker - Wait for all running shrinkers to complete.
+ */
+void sync_shrinkers(void)
+{
+   down_write(_rwsem);
+   up_write(_rwsem);
+}
+EXPORT_SYMBOL(sync_shrinkers);
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.25.1



Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-14 Thread Christian König




Am 13.04.21 um 23:19 schrieb Mikhail Gavrilov:

On Tue, 13 Apr 2021 at 12:29, Christian König  wrote:

Hi Mikhail,

the crash is a known issue and should be fixed by:

commit f63da9ae7584280582cbc834b20cc18bfb203b14
Author: Philip Yang 
Date:   Thu Apr 1 00:22:23 2021 -0400

  drm/amdgpu: reserve fence slot to update page table


Unfortunately, this commit couldn't fix the initial problem.
1. Result video is jerky if it grabbed and encoded with ffmpeg
(h264_vaapi codec).
2. OBS still crashed if I try to record or stream video.
3. In the kernel log still appears the message "amdgpu: [mmhub] page
fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried
to record or stream video by OBS.


That is expected behavior, the application is just buggy and causing a 
page fault on the GPU.


The kernel should just not crash with a backtrace.

Regards,
Christian.


Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K

2021-04-13 Thread Christian König

Hi Mikhail,

the crash is a known issue and should be fixed by:

commit f63da9ae7584280582cbc834b20cc18bfb203b14
Author: Philip Yang 
Date:   Thu Apr 1 00:22:23 2021 -0400

    drm/amdgpu: reserve fence slot to update page table

But that an userspace application can cause a page fault is perfectly 
possible. See here for example https://en.wikipedia.org/wiki/Halting_problem


What we do with misbehaving applications is to log the incident and 
prevent the queue which does nasty things from doing even more submissions.


Regards,
Christian.

Am 13.04.21 um 00:05 schrieb Mikhail Gavrilov:

Video demonstration: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F3nkvUeB0GSwdata=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=yunKS%2Fbm%2B4eF5IMS4dYH9mKELbM6ajK19pXXgm8dv6Q%3Dreserved=0

How looks kernel traces.

1.
[ 7315.156460] amdgpu :0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156490] amdgpu :0b:00.0: amdgpu:   in page starting at
address 0x80011fdf5000 from client 18
[ 7315.156495] amdgpu :0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156500] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156503] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156505] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156509] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156510] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156513] amdgpu :0b:00.0: amdgpu: RW: 0x1
[ 7315.156545] amdgpu :0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156549] amdgpu :0b:00.0: amdgpu:   in page starting at
address 0x80011fdf6000 from client 18
[ 7315.156551] amdgpu :0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156554] amdgpu :0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156556] amdgpu :0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156559] amdgpu :0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156561] amdgpu :0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156564] amdgpu :0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156566] amdgpu :0b:00.0: amdgpu: RW: 0x1

This is a harmless panic, but nevertheless VAAPI does not work and the
application that tried to use the encoder crashed.

2.
If we tries again and again encode 4K stream through VAAPI we can
encounter the next trace:
[12341.860944] [ cut here ]
[12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287!
[12341.860968] invalid opcode:  [#1] SMP NOPTI
[12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G
W- ---  5.12.0-0.rc5.180.fc35.x86_64+debug #1
[12341.860977] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
[12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
90 0f
[12341.861112] RSP: 0018:b2f084c87bb0 EFLAGS: 00010246
[12341.861115] RAX: 0002 RBX: 9f9551184998 RCX: 
[12341.861119] RDX: 0002 RSI:  RDI: 9f9551184a50
[12341.861122] RBP: 0002 R08:  R09: 
[12341.861124] R10:  R11:  R12: 9f91b9a18140
[12341.861127] R13: 9f91c9020740 R14: 9f91c9020768 R15: 
[12341.861130] FS:  () GS:9f984a20()
knlGS:
[12341.861133] CS:  0010 DS:  ES:  CR0: 80050033
[12341.861136] CR2: 144e080d8000 CR3: 00010e98c000 CR4: 00350ee0
[12341.861139] Call Trace:
[12341.861143]  amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu]
[12341.861251]  amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu]
[12341.861356]  amdgpu_vm_handle_fault+0x145/0x290 [amdgpu]
[12341.861461]  gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu]
[12341.861571]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[12341.861577]  ? lock_acquire+0x179/0x3a0
[12341.861583]  ? lock_acquire+0x179/0x3a0
[12341.861587]  ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861692]  amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861796]  amdgpu_ih_process+0x90/0x110 [amdgpu]
[12341.861900]  process_one_work+0x2b0/0x5e0
[12341.861906]  worker_thread+0x55/0x3c0
[12341.861910]  ? process_one_work+0x5e0/0x5e0
[12341.861915]  kthread+0x13a/0x150
[12341.861918]  ? __kthread_bind_mask+0x60/0x60
[12341.861922]  

Re: Unexpected multihop in swaput - likely driver bug.

2021-04-12 Thread Christian König

Hi Mikhail,

thanks a lot for pointing this out.

Turned out that this is a known issue, but I've forgot to push the fix 
to drm-misc-fixes and just queued it up for the next release.


Please re-test drm-misc-fixes and let's hope there is another -rc before 
the final 5.12 kernel.


Thanks,
Christian.

Am 07.04.21 um 20:06 schrieb Mikhail Gavrilov:

On Wed, 7 Apr 2021 at 15:46, Christian König
 wrote:

What hardware are you using

$ inxi -bM
System:Host: fedora Kernel: 5.12.0-0.rc6.184.fc35.x86_64+debug
x86_64 bits: 64 Desktop: GNOME 40.0
Distro: Fedora release 35 (Rawhide)
Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
v: Rev X.0x serial: 
UEFI: American Megatrends v: 3603 date: 03/20/2021
Battery:   ID-1: hidpp_battery_0 charge: N/A condition: N/A
CPU:   Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
2365 MHz min/max: 2200/3500 MHz
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
hid-generic,usbhid,uvcvideo
Device-3: AVerMedia Live Gamer Ultra-Video type: USB
driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
Display: wayland server: X.Org 1.21.1 driver: loaded:
amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
resolution: 3840x2160~60Hz
OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
5.12.0-0.rc6.184.fc35.x86_64+debug LLVM 12.0.0)
v: 4.6 Mesa 21.1.0-devel
Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
Device-2: Intel I211 Gigabit Network driver: igb
Drives:Local Storage: total: 11.35 TiB used: 10.82 TiB (95.3%)
Info:  Processes: 805 Uptime: 12h 56m Memory: 31.18 GiB used:
21.88 GiB (70.2%) Shell: Bash inxi: 3.3.02



and how do you exactly trigger this?

I am running heavy games like "Zombie Army 4: Dead War" and switching
to Gnome Activities and other applications while the game is running.






Re: [PATCH] drm/ttm: cleanup coding style a bit

2021-04-09 Thread Christian König

Am 01.04.21 um 03:59 schrieb Bernard:


From: "Christian König" 
Date: 2021-03-31 21:15:22
To:  Bernard Zhao ,Huang Rui ,David Airlie 
,Daniel Vetter 
,dri-de...@lists.freedesktop.org,linux-kernel@vger.kernel.org
Cc:  opensource.ker...@vivo.com
Subject: Re: [PATCH] drm/ttm: cleanup coding style a bit>Am 31.03.21 um 15:12 
schrieb Bernard Zhao:

Fix sparse warning:
drivers/gpu/drm/ttm/ttm_bo.c:52:1: warning: symbol 'ttm_global_mutex' was not 
declared. Should it be static?
drivers/gpu/drm/ttm/ttm_bo.c:53:10: warning: symbol 'ttm_bo_glob_use_count' was 
not declared. Should it be static?

Signed-off-by: Bernard Zhao 

You are based on an outdated branch, please rebase on top of drm-misc-next.


Hi

Sure, thanks for your review!
I will fix this and resubmit this patch.


Found some time today to do it myself. Please review the patch I've just 
send to you.


Thanks,
Christian.



BR//Bernard


Regards,
Christian.


---
   drivers/gpu/drm/ttm/ttm_bo.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 101a68dc615b..eab21643edfb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -49,8 +49,8 @@ static void ttm_bo_global_kobj_release(struct kobject *kobj);
   /*
* ttm_global_mutex - protecting the global BO state
*/
-DEFINE_MUTEX(ttm_global_mutex);
-unsigned ttm_bo_glob_use_count;
+static DEFINE_MUTEX(ttm_global_mutex);
+static unsigned int ttm_bo_glob_use_count;
   struct ttm_bo_global ttm_bo_glob;
   EXPORT_SYMBOL(ttm_bo_glob);
   






Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function

2021-04-09 Thread Christian König

Am 09.04.21 um 13:00 schrieb Vlastimil Babka:

On 4/9/21 9:17 AM, Christian König wrote:

To be able to switch to a spinlock and reduce lock contention in the TTM
shrinker we don't want to hold a mutex while unmapping and freeing pages
from the pool.

Does using spinlock instead of mutex really reduce lock contention?


Well using the spinlock instead of the mutex is only the cherry on the cake.

The real improvement for the contention is the fact that we just grab 
the next pool and drop the lock again instead of doing the whole IOMMU 
unmap and flushing of the CPU TLB dance while holding the lock.



But then we somehow need to prevent a race between (for example) the shrinker
trying to free pages and hotplug trying to remove the device which those pages
belong to.

Taking and releasing the shrinker semaphore on the write side after
unmapping and freeing all pages should make sure that no shrinker is running in
paralell any more.

So you explain this in this commit log for adding the function, but then the
next patch just adds a sync_shrinkers() call without any comment. I would expect
there a comment explaining why it's done there - what it protects against, as
it's not an obvious pattern IMHO.


Good point, going to add a comment.

Thanks,
Christian.




Signed-off-by: Christian König 
---
  include/linux/shrinker.h |  1 +
  mm/vmscan.c  | 10 ++
  2 files changed, 11 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 0f80123650e2..6b75dc372fce 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -92,4 +92,5 @@ extern void register_shrinker_prepared(struct shrinker 
*shrinker);
  extern int register_shrinker(struct shrinker *shrinker);
  extern void unregister_shrinker(struct shrinker *shrinker);
  extern void free_prealloced_shrinker(struct shrinker *shrinker);
+extern void sync_shrinkers(void);
  #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 562e87cbd7a1..46cd9c215d73 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -408,6 +408,16 @@ void unregister_shrinker(struct shrinker *shrinker)
  }
  EXPORT_SYMBOL(unregister_shrinker);
  
+/**

+ * sync_shrinker - Wait for all running shrinkers to complete.
+ */
+void sync_shrinkers(void)
+{
+   down_write(_rwsem);
+   up_write(_rwsem);
+}
+EXPORT_SYMBOL(sync_shrinkers);
+
  #define SHRINK_BATCH 128
  
  static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,






[PATCH 1/2] mm/vmscan: add sync_shrinkers function

2021-04-09 Thread Christian König
To be able to switch to a spinlock and reduce lock contention in the TTM
shrinker we don't want to hold a mutex while unmapping and freeing pages
from the pool.

But then we somehow need to prevent a race between (for example) the shrinker
trying to free pages and hotplug trying to remove the device which those pages
belong to.

Taking and releasing the shrinker semaphore on the write side after
unmapping and freeing all pages should make sure that no shrinker is running in
paralell any more.

Signed-off-by: Christian König 
---
 include/linux/shrinker.h |  1 +
 mm/vmscan.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 0f80123650e2..6b75dc372fce 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -92,4 +92,5 @@ extern void register_shrinker_prepared(struct shrinker 
*shrinker);
 extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
+extern void sync_shrinkers(void);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 562e87cbd7a1..46cd9c215d73 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -408,6 +408,16 @@ void unregister_shrinker(struct shrinker *shrinker)
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
+/**
+ * sync_shrinker - Wait for all running shrinkers to complete.
+ */
+void sync_shrinkers(void)
+{
+   down_write(_rwsem);
+   up_write(_rwsem);
+}
+EXPORT_SYMBOL(sync_shrinkers);
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.25.1



[PATCH 2/2] drm/ttm: optimize the pool shrinker a bit

2021-04-09 Thread Christian König
Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 40 +++---
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..a8b4abe687ce 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, 
struct ttm_pool *pool,
spin_lock_init(>lock);
INIT_LIST_HEAD(>pages);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
list_add_tail(>shrinker_list, _list);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
struct page *p;
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
list_del(>shrinker_list);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
while ((p = ttm_pool_type_take(pt)))
ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,19 @@ static struct ttm_pool_type *ttm_pool_select_type(struct 
ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
struct ttm_pool_type *pt;
-   unsigned int num_freed;
struct page *p;
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
pt = list_first_entry(_list, typeof(*pt), shrinker_list);
+   list_move_tail(>shrinker_list, _list);
+   spin_unlock(_lock);
 
p = ttm_pool_type_take(pt);
-   if (p) {
-   ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-   num_freed = 1 << pt->order;
-   } else {
-   num_freed = 0;
-   }
-
-   list_move_tail(>shrinker_list, _list);
-   mutex_unlock(_lock);
+   if (!p)
+   return 0;
 
-   return num_freed;
+   ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+   return 1 << pt->order;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +525,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
for (j = 0; j < MAX_ORDER; ++j)
ttm_pool_type_fini(>caching[i].orders[j]);
}
+   sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +600,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
 {
ttm_pool_debugfs_header(m);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
seq_puts(m, "wc\t:");
ttm_pool_debugfs_orders(global_write_combined, m);
seq_puts(m, "uc\t:");
@@ -613,7 +609,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file 
*m, void *data)
ttm_pool_debugfs_orders(global_dma32_write_combined, m);
seq_puts(m, "uc 32\t:");
ttm_pool_debugfs_orders(global_dma32_uncached, m);
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
ttm_pool_debugfs_footer(m);
 
@@ -640,7 +636,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
 
ttm_pool_debugfs_header(m);
 
-   mutex_lock(_lock);
+   spin_lock(_lock);
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
seq_puts(m, "DMA ");
switch (i) {
@@ -656,7 +652,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m)
}
ttm_pool_debugfs_orders(pool->caching[i].orders, m);
}
-   mutex_unlock(_lock);
+   spin_unlock(_lock);
 
ttm_pool_debugfs_footer(m);
return 0;
@@ -693,7 +689,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
if (!page_pool_size)
page_pool_size = num_pages;
 
-   mutex_init(_lock);
+   spin_lock_init(_lock);
INIT_LIST_HEAD(_list);
 
for (i = 0; i < MAX_ORDER; ++i) {
-- 
2.25.1



Re: [PATCH v3] drm/syncobj: use newly allocated stub fences

2021-04-08 Thread Christian König

Am 08.04.21 um 11:54 schrieb David Stevens:

From: David Stevens 

Allocate a new private stub fence in drm_syncobj_assign_null_handle,
instead of using a static stub fence.

When userspace creates a fence with DRM_SYNCOBJ_CREATE_SIGNALED or when
userspace signals a fence via DRM_IOCTL_SYNCOBJ_SIGNAL, the timestamp
obtained when the fence is exported and queried with SYNC_IOC_FILE_INFO
should match when the fence's status was changed from the perspective of
userspace, which is during the respective ioctl.

When a static stub fence started being used in by these ioctls, this
behavior changed. Instead, the timestamp returned by SYNC_IOC_FILE_INFO
became the first time anything used the static stub fence, which has no
meaning to userspace.

Signed-off-by: David Stevens 


Reviewed-by: Christian König 

Should I push this to drm-misc-next or how do you want to upstream it?

Thanks,
Christian.


---
v2 -> v3:
  - reuse the static stub spinlock
v1 -> v2:
  - checkpatch style fixes

  drivers/dma-buf/dma-fence.c   | 27 ++-
  drivers/gpu/drm/drm_syncobj.c | 25 +++--
  include/linux/dma-fence.h |  1 +
  3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index d64fc03929be..ce0f5eff575d 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -123,7 +123,9 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
  /**
   * dma_fence_get_stub - return a signaled fence
   *
- * Return a stub fence which is already signaled.
+ * Return a stub fence which is already signaled. The fence's
+ * timestamp corresponds to the first time after boot this
+ * function is called.
   */
  struct dma_fence *dma_fence_get_stub(void)
  {
@@ -141,6 +143,29 @@ struct dma_fence *dma_fence_get_stub(void)
  }
  EXPORT_SYMBOL(dma_fence_get_stub);
  
+/**

+ * dma_fence_allocate_private_stub - return a private, signaled fence
+ *
+ * Return a newly allocated and signaled stub fence.
+ */
+struct dma_fence *dma_fence_allocate_private_stub(void)
+{
+   struct dma_fence *fence;
+
+   fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+   if (fence == NULL)
+   return ERR_PTR(-ENOMEM);
+
+   dma_fence_init(fence,
+  _fence_stub_ops,
+  _fence_stub_lock,
+  0, 0);
+   dma_fence_signal(fence);
+
+   return fence;
+}
+EXPORT_SYMBOL(dma_fence_allocate_private_stub);
+
  /**
   * dma_fence_context_alloc - allocate an array of fence contexts
   * @num: amount of contexts to allocate
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 349146049849..a54aa850d143 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -350,12 +350,16 @@ EXPORT_SYMBOL(drm_syncobj_replace_fence);
   *
   * Assign a already signaled stub fence to the sync object.
   */
-static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
+static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct dma_fence *fence = dma_fence_get_stub();
+   struct dma_fence *fence = dma_fence_allocate_private_stub();
+
+   if (IS_ERR(fence))
+   return PTR_ERR(fence);
  
  	drm_syncobj_replace_fence(syncobj, fence);

dma_fence_put(fence);
+   return 0;
  }
  
  /* 5s default for wait submission */

@@ -469,6 +473,7 @@ EXPORT_SYMBOL(drm_syncobj_free);
  int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
   struct dma_fence *fence)
  {
+   int ret;
struct drm_syncobj *syncobj;
  
  	syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);

@@ -479,8 +484,13 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
INIT_LIST_HEAD(>cb_list);
spin_lock_init(>lock);
  
-	if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)

-   drm_syncobj_assign_null_handle(syncobj);
+   if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
+   ret = drm_syncobj_assign_null_handle(syncobj);
+   if (ret < 0) {
+   drm_syncobj_put(syncobj);
+   return ret;
+   }
+   }
  
  	if (fence)

drm_syncobj_replace_fence(syncobj, fence);
@@ -1322,8 +1332,11 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++)

-   drm_syncobj_assign_null_handle(syncobjs[i]);
+   for (i = 0; i < args->count_handles; i++) {
+   ret = drm_syncobj_assign_null_handle(syncobjs[i]);
+   if (ret < 0)
+   break;
+   }
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h

index 9f12efaaa93a..6ffb4b2c6371 100644
--- a/include/linux

Re: [PATCH] drm/syncobj: use newly allocated stub fences

2021-04-08 Thread Christian König

Am 08.04.21 um 11:34 schrieb David Stevens:

From: David Stevens 

Allocate a new private stub fence in drm_syncobj_assign_null_handle,
instead of using a static stub fence.

When userspace creates a fence with DRM_SYNCOBJ_CREATE_SIGNALED or when
userspace signals a fence via DRM_IOCTL_SYNCOBJ_SIGNAL, the timestamp
obtained when the fence is exported and queried with SYNC_IOC_FILE_INFO
should match when the fence's status was changed from the perspective of
userspace, which is during the respective ioctl.

When a static stub fence started being used in by these ioctls, this
behavior changed. Instead, the timestamp returned by SYNC_IOC_FILE_INFO
became the first time anything used the static stub fence, which has no
meaning to userspace.

Signed-off-by: David Stevens 
---
  drivers/dma-buf/dma-fence.c   | 33 -
  drivers/gpu/drm/drm_syncobj.c | 28 
  include/linux/dma-fence.h |  1 +
  3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index d64fc03929be..6081eb962490 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -26,6 +26,11 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
  static DEFINE_SPINLOCK(dma_fence_stub_lock);
  static struct dma_fence dma_fence_stub;
  
+struct drm_fence_private_stub {

+   struct dma_fence base;
+   spinlock_t lock;
+};
+


You can drop this. The spinlock is only used when the fence is signaled 
to avoid races between signaling and adding a callback.


And for this the global spinlock should be perfectly sufficient. Apart 
from that looks good to me.


Regards,
Christian.


  /*
   * fence context counter: each execution context should have its own
   * fence context, this allows checking if fences belong to the same
@@ -123,7 +128,9 @@ static const struct dma_fence_ops dma_fence_stub_ops = {
  /**
   * dma_fence_get_stub - return a signaled fence
   *
- * Return a stub fence which is already signaled.
+ * Return a stub fence which is already signaled. The fence's
+ * timestamp corresponds to the first time after boot this
+ * function is called.
   */
  struct dma_fence *dma_fence_get_stub(void)
  {
@@ -141,6 +148,30 @@ struct dma_fence *dma_fence_get_stub(void)
  }
  EXPORT_SYMBOL(dma_fence_get_stub);
  
+/**

+ * dma_fence_allocate_private_stub - return a private, signaled fence
+ *
+ * Return a newly allocated and signaled stub fence.
+ */
+struct dma_fence *dma_fence_allocate_private_stub(void)
+{
+   struct drm_fence_private_stub *fence;
+
+   fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+   if (fence == NULL)
+   return ERR_PTR(-ENOMEM);
+
+   spin_lock_init(>lock);
+   dma_fence_init(>base,
+  _fence_stub_ops,
+  >lock,
+  0, 0);
+   dma_fence_signal(>base);
+
+   return >base;
+}
+EXPORT_SYMBOL(dma_fence_allocate_private_stub);
+
  /**
   * dma_fence_context_alloc - allocate an array of fence contexts
   * @num: amount of contexts to allocate
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 349146049849..c6125e57ae37 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -350,12 +350,15 @@ EXPORT_SYMBOL(drm_syncobj_replace_fence);
   *
   * Assign a already signaled stub fence to the sync object.
   */
-static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
+static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct dma_fence *fence = dma_fence_get_stub();
+   struct dma_fence *fence = dma_fence_allocate_private_stub();
+   if (IS_ERR(fence))
+  return PTR_ERR(fence);
  
-	drm_syncobj_replace_fence(syncobj, fence);

-   dma_fence_put(fence);
+   drm_syncobj_replace_fence(syncobj, fence);
+   dma_fence_put(fence);
+   return 0;
  }
  
  /* 5s default for wait submission */

@@ -469,6 +472,7 @@ EXPORT_SYMBOL(drm_syncobj_free);
  int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
   struct dma_fence *fence)
  {
+   int ret;
struct drm_syncobj *syncobj;
  
  	syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);

@@ -479,8 +483,13 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
INIT_LIST_HEAD(>cb_list);
spin_lock_init(>lock);
  
-	if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)

-   drm_syncobj_assign_null_handle(syncobj);
+   if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
+   ret = drm_syncobj_assign_null_handle(syncobj);
+   if (ret < 0) {
+   drm_syncobj_put(syncobj);
+   return ret;
+   }
+   }
  
  	if (fence)

drm_syncobj_replace_fence(syncobj, fence);
@@ -1322,8 +1331,11 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
 

Re: [PATCH] Revert "drm/syncobj: use dma_fence_get_stub"

2021-04-08 Thread Christian König

Am 08.04.21 um 11:30 schrieb David Stevens:

On Thu, Apr 8, 2021 at 4:03 PM Christian König  wrote:

Am 08.04.21 um 06:59 schrieb David Stevens:

From: David Stevens 

This reverts commit 86bbd89d5da66fe760049ad3f04adc407ec0c4d6.

Using the singleton stub fence in drm_syncobj_assign_null_handle means
that all syncobjs created in an already signaled state or any syncobjs
signaled by userspace will reference the singleton fence when exported
to a sync_file. If those sync_files are queried with SYNC_IOC_FILE_INFO,
then the timestamp_ns value returned will correspond to whenever the
singleton stub fence was first initialized. This can break the ability
of userspace to use timestamps of these fences, as the singleton stub
fence's timestamp bears no relationship to any meaningful event.

And why exactly is having the timestamp of the call to
drm_syncobj_assign_null_handle() better?

The timestamp returned by SYNC_IOC_FILE_INFO is the "timestamp of
status change in nanoseconds". If userspace signals the fence with
DRM_IOCTL_SYNCOBJ_SIGNAL, then a timestamp from
drm_syncobj_assign_null_handle corresponds to the status change. If
userspace sets DRM_SYNCOBJ_CREATE_SIGNALED when creating a fence, then
the status change happens immediately upon creation, which again
corresponds to when drm_syncobj_assign_null_handle gets called.


Ok, that makes sense.




Additional if you really need that please don't revert the patch.
Instead provide a function which returns a newly initialized stub fence
in the dma_fence.c code.

Ack.


Just add a something like dma_fence_get_new_stub() with kmalloc(), 
dma_fence_init() and dma_fence_signal().


Shouldn't be more than a six liner.

Thanks,
Christian.



-David


Regards,
Christian.


Signed-off-by: David Stevens 
---
   drivers/gpu/drm/drm_syncobj.c | 58 ++-
   1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 349146049849..7cc11f1a83f4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -211,6 +211,21 @@ struct syncobj_wait_entry {
   static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
 struct syncobj_wait_entry *wait);

+struct drm_syncobj_stub_fence {
+ struct dma_fence base;
+ spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+ return "syncobjstub";
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+ .get_driver_name = drm_syncobj_stub_fence_get_name,
+ .get_timeline_name = drm_syncobj_stub_fence_get_name,
+};
+
   /**
* drm_syncobj_find - lookup and reference a sync object.
* @file_private: drm file private pointer
@@ -344,18 +359,24 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
   }
   EXPORT_SYMBOL(drm_syncobj_replace_fence);

-/**
- * drm_syncobj_assign_null_handle - assign a stub fence to the sync object
- * @syncobj: sync object to assign the fence on
- *
- * Assign a already signaled stub fence to the sync object.
- */
-static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
+static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
   {
- struct dma_fence *fence = dma_fence_get_stub();
+ struct drm_syncobj_stub_fence *fence;

- drm_syncobj_replace_fence(syncobj, fence);
- dma_fence_put(fence);
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (fence == NULL)
+ return -ENOMEM;
+
+ spin_lock_init(>lock);
+ dma_fence_init(>base, _syncobj_stub_fence_ops,
+>lock, 0, 0);
+ dma_fence_signal(>base);
+
+ drm_syncobj_replace_fence(syncobj, >base);
+
+ dma_fence_put(>base);
+
+ return 0;
   }

   /* 5s default for wait submission */
@@ -469,6 +490,7 @@ EXPORT_SYMBOL(drm_syncobj_free);
   int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
  struct dma_fence *fence)
   {
+ int ret;
   struct drm_syncobj *syncobj;

   syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);
@@ -479,8 +501,13 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
   INIT_LIST_HEAD(>cb_list);
   spin_lock_init(>lock);

- if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
- drm_syncobj_assign_null_handle(syncobj);
+ if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
+ ret = drm_syncobj_assign_null_handle(syncobj);
+ if (ret < 0) {
+ drm_syncobj_put(syncobj);
+ return ret;
+ }
+ }

   if (fence)
   drm_syncobj_replace_fence(syncobj, fence);
@@ -1322,8 +1349,11 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
   if (ret < 0)
   return ret;

- for (i = 0; i < args->count_handles; i++)
- drm_syn

Re: [PATCH] Revert "drm/syncobj: use dma_fence_get_stub"

2021-04-08 Thread Christian König

Am 08.04.21 um 06:59 schrieb David Stevens:

From: David Stevens 

This reverts commit 86bbd89d5da66fe760049ad3f04adc407ec0c4d6.

Using the singleton stub fence in drm_syncobj_assign_null_handle means
that all syncobjs created in an already signaled state or any syncobjs
signaled by userspace will reference the singleton fence when exported
to a sync_file. If those sync_files are queried with SYNC_IOC_FILE_INFO,
then the timestamp_ns value returned will correspond to whenever the
singleton stub fence was first initialized. This can break the ability
of userspace to use timestamps of these fences, as the singleton stub
fence's timestamp bears no relationship to any meaningful event.


And why exactly is having the timestamp of the call to 
drm_syncobj_assign_null_handle() better?


Additional if you really need that please don't revert the patch. 
Instead provide a function which returns a newly initialized stub fence 
in the dma_fence.c code.


Regards,
Christian.



Signed-off-by: David Stevens 
---
  drivers/gpu/drm/drm_syncobj.c | 58 ++-
  1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 349146049849..7cc11f1a83f4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -211,6 +211,21 @@ struct syncobj_wait_entry {
  static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
  struct syncobj_wait_entry *wait);
  
+struct drm_syncobj_stub_fence {

+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+   return "syncobjstub";
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+};
+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -344,18 +359,24 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
-/**

- * drm_syncobj_assign_null_handle - assign a stub fence to the sync object
- * @syncobj: sync object to assign the fence on
- *
- * Assign a already signaled stub fence to the sync object.
- */
-static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
+static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct dma_fence *fence = dma_fence_get_stub();
+   struct drm_syncobj_stub_fence *fence;
  
-	drm_syncobj_replace_fence(syncobj, fence);

-   dma_fence_put(fence);
+   fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+   if (fence == NULL)
+   return -ENOMEM;
+
+   spin_lock_init(>lock);
+   dma_fence_init(>base, _syncobj_stub_fence_ops,
+  >lock, 0, 0);
+   dma_fence_signal(>base);
+
+   drm_syncobj_replace_fence(syncobj, >base);
+
+   dma_fence_put(>base);
+
+   return 0;
  }
  
  /* 5s default for wait submission */

@@ -469,6 +490,7 @@ EXPORT_SYMBOL(drm_syncobj_free);
  int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
   struct dma_fence *fence)
  {
+   int ret;
struct drm_syncobj *syncobj;
  
  	syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);

@@ -479,8 +501,13 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
INIT_LIST_HEAD(>cb_list);
spin_lock_init(>lock);
  
-	if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)

-   drm_syncobj_assign_null_handle(syncobj);
+   if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
+   ret = drm_syncobj_assign_null_handle(syncobj);
+   if (ret < 0) {
+   drm_syncobj_put(syncobj);
+   return ret;
+   }
+   }
  
  	if (fence)

drm_syncobj_replace_fence(syncobj, fence);
@@ -1322,8 +1349,11 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++)

-   drm_syncobj_assign_null_handle(syncobjs[i]);
+   for (i = 0; i < args->count_handles; i++) {
+   ret = drm_syncobj_assign_null_handle(syncobjs[i]);
+   if (ret < 0)
+   break;
+   }
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  




Re: [PATCH] drm/amd/pm: convert sysfs snprintf to sysfs_emit

2021-04-07 Thread Christian König

Am 06.04.21 um 16:11 schrieb Carlis:

From: Xuezhi Zhang 

Fix the following coccicheck warning:
drivers/gpu/drm/amd/pm//amdgpu_pm.c:1940:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:1978:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2022:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:294:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:154:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:496:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:512:9-17:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:1740:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:1667:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2074:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2047:9-17:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2768:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2738:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2442:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3246:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3253:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2458:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3047:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3133:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3209:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3216:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2410:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2496:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2470:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2426:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2965:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:2972:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3006:8-16:
WARNING: use scnprintf or sprintf
drivers/gpu/drm/amd/pm//amdgpu_pm.c:3013:8-16:
WARNING: use scnprintf or sprintf

Signed-off-by: Xuezhi Zhang 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/pm/amdgpu_pm.c | 58 +++---
  1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 5fa65f191a37..2777966ec1ca 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -151,7 +151,7 @@ static ssize_t amdgpu_get_power_dpm_state(struct device 
*dev,
pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
  
-	return snprintf(buf, PAGE_SIZE, "%s\n",

+   return sysfs_emit(buf, "%s\n",
(pm == POWER_STATE_TYPE_BATTERY) ? "battery" :
(pm == POWER_STATE_TYPE_BALANCED) ? "balanced" : 
"performance");
  }
@@ -291,7 +291,7 @@ static ssize_t 
amdgpu_get_power_dpm_force_performance_level(struct device *dev,
pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
  
-	return snprintf(buf, PAGE_SIZE, "%s\n",

+   return sysfs_emit(buf, "%s\n",
(level == AMD_DPM_FORCED_LEVEL_AUTO) ? "auto" :
(level == AMD_DPM_FORCED_LEVEL_LOW) ? "low" :
(level == AMD_DPM_FORCED_LEVEL_HIGH) ? "high" :
@@ -493,7 +493,7 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
if (i == data.nums)
i = -EINVAL;
  
-	return snprintf(buf, PAGE_SIZE, "%d\n", i);

+   return sysfs_emit(buf, "%d\n", i);
  }
  
  static ssize_t amdgpu_get_pp_force_state(struct device *dev,

@@ -509,7 +509,7 @@ static ssize_t amdgpu_get_pp_force_state(struct device *dev,
if (adev->pp_force_state_enabled)
return amdgpu_get_pp_cur_state(dev, attr, buf);
else
-   return snprintf(buf, PAGE_SIZE, "\n");
+   return sysfs_emit(buf, "\n");
  }
  
  static ssize_t amdgpu_set_pp_force_state(struct device *dev,

@@ -1664,7 +1664,7 @@ static ssize_t amdgpu_get_pp_sclk_od(struct device *dev,
pm_runtime_mark_last_busy(ddev->dev);
pm_runtime_put_autosuspend(ddev->dev);
  
-	return snprintf(buf, PAGE_SIZE, "%d\n", value);

+   return sysfs_emit(buf, "%d\n", value);
  }
  
  static ssize_t amdgpu_set_pp_sclk_od(struct device *dev,

@@ -1737,7 +1737,7 @@ static ssize_t amdg

Re: Unexpected multihop in swaput - likely driver bug.

2021-04-07 Thread Christian König

What hardware are you using and how do you exactly trigger this?

Thanks,
Christian.

Am 07.04.21 um 11:30 schrieb Mikhail Gavrilov:

Hi!
During the 5.12 testing cycle I observed the repeatable bug when
launching heavy graphic applications.
The kernel log is flooded with the message "Unexpected multihop in
swaput - likely driver bug.".

Trace:
[ 8707.814899] [ cut here ]
[ 8707.814920] Unexpected multihop in swaput - likely driver bug.
[ 8707.814998] WARNING: CPU: 19 PID: 28231 at
drivers/gpu/drm/ttm/ttm_bo.c:1484 ttm_bo_swapout+0x40b/0x420 [ttm]
[ 8707.815011] Modules linked in: tun uinput snd_seq_dummy rfcomm
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
hid_logitech_dj intel_rapl_msr snd_hda_codec_realtek intel_rapl_common
mt76x2u snd_hda_codec_generic mt76x2_common mt76x02_usb iwlmvm
ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib snd_hda_intel
mt76 snd_intel_dspcfg snd_intel_sdw_acpi mac80211 joydev snd_usb_audio
snd_hda_codec uvcvideo edac_mce_amd videobuf2_vmalloc snd_hda_core
snd_usbmidi_lib videobuf2_memops snd_hwdep iwlwifi snd_rawmidi btusb
videobuf2_v4l2 kvm_amd snd_seq videobuf2_common btrtl btbcm videodev
btintel snd_seq_device kvm mc cfg80211 bluetooth snd_pcm libarc4
eeepc_wmi snd_timer asus_wmi irqbypass xpad sp5100_tco
[ 8707.815065]  sparse_keymap ecdh_generic ff_memless video ecc
wmi_bmof i2c_piix4 snd rapl k10temp soundcore rfkill acpi_cpufreq
ip_tables amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper
crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel
igb ccp nvme dca nvme_core i2c_algo_bit wmi pinctrl_amd fuse
[ 8707.815096] CPU: 19 PID: 28231 Comm: kworker/u64:1 Tainted: G
  W- ---  5.12.0-0.rc6.184.fc35.x86_64+debug #1
[ 8707.815101] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3603 03/20/2021
[ 8707.815106] Workqueue: ttm_swap ttm_shrink_work [ttm]
[ 8707.815114] RIP: 0010:ttm_bo_swapout+0x40b/0x420 [ttm]
[ 8707.815122] Code: 10 00 00 48 c1 e2 0c 48 c1 e6 0c e8 3f 37 fa c8
e9 71 fe ff ff 83 f8 b8 0f 85 a9 fe ff ff 48 c7 c7 28 32 37 c0 e8 02
2b 98 c9 <0f> 0b e9 96 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
00 0f
[ 8707.815126] RSP: 0018:a306d20e7d58 EFLAGS: 00010292
[ 8707.815130] RAX: 0032 RBX: c0379260 RCX: 0027
[ 8707.815133] RDX: 918c091daae8 RSI: 0001 RDI: 918c091daae0
[ 8707.815136] RBP: 918602210058 R08:  R09: 
[ 8707.815138] R10: a306d20e7b90 R11: 918c2e2fffe8 R12: ffb8
[ 8707.815141] R13: c03792a0 R14: 9186022102c0 R15: 0001
[ 8707.815145] FS:  () GS:918c0900()
knlGS:
[ 8707.815148] CS:  0010 DS:  ES:  CR0: 80050033
[ 8707.815151] CR2: 325c84d12000 CR3: 000776c28000 CR4: 00350ee0
[ 8707.815154] Call Trace:
[ 8707.815164]  ttm_shrink+0xa6/0xe0 [ttm]
[ 8707.815171]  ttm_shrink_work+0x36/0x40 [ttm]
[ 8707.815177]  process_one_work+0x2b0/0x5e0
[ 8707.815185]  worker_thread+0x55/0x3c0
[ 8707.815188]  ? process_one_work+0x5e0/0x5e0
[ 8707.815192]  kthread+0x13a/0x150
[ 8707.815196]  ? __kthread_bind_mask+0x60/0x60
[ 8707.815199]  ret_from_fork+0x22/0x30
[ 8707.815207] irq event stamp: 0
[ 8707.815209] hardirqs last  enabled at (0): [<>] 0x0
[ 8707.815213] hardirqs last disabled at (0): []
copy_process+0x91b/0x1e10
[ 8707.815218] softirqs last  enabled at (0): []
copy_process+0x91b/0x1e10
[ 8707.815222] softirqs last disabled at (0): [<>] 0x0
[ 8707.815224] ---[ end trace 29252aa87289bbaa ]---

Full kernel log: https://pastebin.com/mmAxwBYc

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_bo_swapout+0x40b
ttm_bo_swapout+0x40b/0x420:
ttm_bo_swapout at
/usr/src/debug/kernel-5.12-rc6/linux-5.12.0-0.rc6.184.fc35.x86_64/drivers/gpu/drm/ttm/ttm_bo.c:1484
(discriminator 1)


$ git blame drivers/gpu/drm/ttm/ttm_bo.c -L 1475,1494
Blaming lines:   1% (20/1530), done.
ebdf565169af0 (Dave Airlie  2020-10-29 13:58:52 +1000 1475)
  memset(, 0, sizeof(hop));
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1476)
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1477)
  evict_mem = bo->mem;
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1478)
  evict_mem.mm_node = NULL;
ce65b874001d7 (Christian König  2020-09-30 16:44:16 +0200 1479)
  evict_mem.placement = 0;
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1480)
  evict_mem.mem_type = TTM_PL_SYSTEM;
ba4e7d973dd09 (Th

Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-04-07 Thread Christian König

Am 07.04.21 um 09:47 schrieb Daniel Gomez:

On Tue, 6 Apr 2021 at 22:56, Alex Deucher  wrote:

On Mon, Mar 22, 2021 at 6:34 AM Christian König
 wrote:

Hi Daniel,

Am 22.03.21 um 10:38 schrieb Daniel Gomez:

On Fri, 19 Mar 2021 at 21:29, Felix Kuehling  wrote:

This caused a regression in kfdtest in a large-buffer stress test after
memory allocation for user pages fails:

I'm sorry to hear that. BTW, I guess you meant amdgpu leak patch and
not this one.
Just some background for the mem leak patch if helps to understand this:
The leak was introduce here:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D0b988ca1c7c4c73983b4ea96ef7c2af2263c87ebdata=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=%2FeOQf12NBkC3YGZ7QW66%2FT%2FpyM3DjEb9IMbqUvISMfo%3Dreserved=0
where the bound status was introduced for all drm drivers including
radeon and amdgpu. So this patch just reverts the logic to the
original code but keeping the bound status. In my case, the binding
code allocates the user pages memory and returns without bounding (at
amdgpu_gtt_mgr_has_gart_addr). So,
when the unbinding happens, the memory needs to be cleared to prevent the leak.

Ah, now I understand what's happening here. Daniel your patch is not
really correct.

The problem is rather that we don't set the tt object to bound if it
doesn't have a GTT address.

Going to provide a patch for this.

Did this patch ever land?

I don't think so but I might send a v2 following Christian's comment
if you guys agree.


Somebody else already provided a patch which I reviewed, but I'm not 
sure if that landed either.



Also, the patch here is for radeon but the pagefault issue reported by
Felix is affected by the amdgpu one:

radeon patch: drm/radeon/ttm: Fix memory leak userptr pages
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210318083236.43578-1-daniel%40qtec.com%2Fdata=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=HSMK%2FqYz%2Bzz9qbKc%2FITUWluBDeaW9YrgyH8p0L640%2F0%3Dreserved=0

amdgpu patch: drm/amdgpu/ttm: Fix memory leak userptr pages
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210317160840.36019-1-daniel%40qtec.com%2Fdata=04%7C01%7CChristian.Koenig%40amd.com%7C65d21b6f02da409ac7b508d8f9997367%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533784761980218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=UsUZ4YjCSjHmzlPB07oTaGrsntTrQSwlGk%2BxUjwDiag%3Dreserved=0

I assume both need to be fixed with the same approach.


Yes correct. Let me double check where that fix went.

Thanks,
Christian.



Daniel

Alex


Regards,
Christian.


[17359.536303] amdgpu: init_user_pages: Failed to get user pages: -16
[17359.543746] BUG: kernel NULL pointer dereference, address: 
[17359.551494] #PF: supervisor read access in kernel mode
[17359.557375] #PF: error_code(0x) - not-present page
[17359.563247] PGD 0 P4D 0
[17359.566514] Oops:  [#1] SMP PTI
[17359.570728] CPU: 8 PID: 5944 Comm: kfdtest Not tainted 5.11.0-kfd-fkuehlin 
#193
[17359.578760] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 3201 
06/17/2016
[17359.586971] RIP: 0010:amdgpu_ttm_backend_unbind+0x52/0x110 [amdgpu]
[17359.594075] Code: 48 39 c6 74 1b 8b 53 0c 48 8d bd 80 a1 ff ff e8 24 62 00 00 85 
c0 0f 85 ab 00 00 00 c6 43 54 00 5b 5d c3 48 8b 46 10 8b 4e 50 <48> 8b 30 48 85 
f6 74 ba 8b 50 0c 48 8b bf 80 a1 ff ff 83 e1 01 45
[17359.614340] RSP: 0018:a4764971fc98 EFLAGS: 00010206
[17359.620315] RAX:  RBX: 950e8d4edf00 RCX: 
[17359.628204] RDX:  RSI: 950e8d4edf00 RDI: 950eadec5e80
[17359.636084] RBP: 950eadec5e80 R08:  R09: 
[17359.643958] R10: 0246 R11: 0001 R12: 950c03377800
[17359.651833] R13: 950eadec5e80 R14: 950c03377858 R15: 
[17359.659701] FS:  7febb20cb740() GS:950ebfc0() 
knlGS:
[17359.668528] CS:  0010 DS:  ES:  CR0: 80050033
[17359.675012] CR2:  CR3: 0006d700e005 CR4: 001706e0
[17359.682883] Call Trace:
[17359.686063]  amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
[17359.692349]  ttm_bo_cleanup_memtype_use+0x37/0x60 [ttm]
[17359.698307]  ttm_bo_release+0x278/0x5e0 [ttm]
[17359.703385]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[17359.

Re: your mail

2021-04-07 Thread Christian König
Thanks Ray for pointing this out. Looks like the mail ended up in my 
spam folder otherwise.


Apart from that this patch is a really really big NAK. I can't count how 
often I had to reject stuff like this!


Using the page reference for TTM pages is illegal and can lead to struct 
page corruption.


Can you please describe why you need that?

Regards,
Christian.

Am 07.04.21 um 10:25 schrieb Huang Rui:

On Wed, Apr 07, 2021 at 09:27:46AM +0800, songqiang wrote:

Please add the description in the commit message and subject.

Thanks,
Ray


Signed-off-by: songqiang 
---
  drivers/gpu/drm/ttm/ttm_page_alloc.c | 18 ++
  1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 14660f723f71..f3698f0ad4d7 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -736,8 +736,16 @@ static void ttm_put_pages(struct page **pages, unsigned 
npages, int flags,
if (++p != pages[i + j])
break;
  
-if (j == HPAGE_PMD_NR)

+   if (j == HPAGE_PMD_NR) {
order = HPAGE_PMD_ORDER;
+   for (j = 1; j < HPAGE_PMD_NR; ++j)
+   page_ref_dec(pages[i+j]);
+   }
}
  #endif
  
@@ -868,10 +876,12 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags,

p = alloc_pages(huge_flags, HPAGE_PMD_ORDER);
if (!p)
break;
-
-   for (j = 0; j < HPAGE_PMD_NR; ++j)
+   for (j = 0; j < HPAGE_PMD_NR; ++j) {
pages[i++] = p++;
-
+   if (j > 0)
+   page_ref_inc(pages[i-1]);
+   }
npages -= HPAGE_PMD_NR;
}
}



___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=04%7C01%7Cray.huang%40amd.com%7C4ccc617b77d746db5af108d8f98db612%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533734805563118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=9bSP90LYdJyJYJYmuphVmqk%2B3%2FE4JPrtXkQTbxwAt68%3Dreserved=0




Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-06 Thread Christian König

Hi Qu,

Am 06.04.21 um 08:04 schrieb Qu Huang:

Hi Christian,

On 2021/4/3 16:49, Christian König wrote:

Hi Qu,

Am 03.04.21 um 07:08 schrieb Qu Huang:

Hi Christian,

On 2021/4/3 0:25, Christian König wrote:

Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:

Before dma_resv_lock(bo->base.resv, NULL) in
amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),


That can't happen since when bo_release_notify is called the BO has 
not

more references and is therefore deleted.

And we never evict a deleted BO, we just wait for it to become idle.


Yes, the bo reference counter return to zero will enter
ttm_bo_release(),but notify bo release (call 
amdgpu_bo_release_notify())

first happen, and then test if a reservation object's fences have been
signaled, and then mark bo as deleted and remove bo from the LRU list.

When ttm_bo_release() and ttm_mem_evict_first() is concurrent,
the Bo has not been removed from the LRU list and is not marked as
deleted, this will happen.


Not sure on which code base you are, but I don't see how this can 
happen.


ttm_mem_evict_first() calls ttm_bo_get_unless_zero() and
ttm_bo_release() is only called when the BO reference count becomes 
zero.


So ttm_mem_evict_first() will see that this BO is about to be destroyed
and skips it.



Yes, you are right. My version of TTM is ROCM 3.3, so
ttm_mem_evict_first() did not call ttm_bo_get_unless_zero(), check that
ROCM 4.0 ttm doesn't have this issue. This is an oversight on my part.



As a test, when we use CPU memset instead of SDMA fill in
amdgpu_bo_release_notify(), the result is page fault:

PID: 5490   TASK: 8e8136e04100  CPU: 4   COMMAND: "gemmPerf"
  #0 [8e79eaa17970] machine_kexec at b2863784
  #1 [8e79eaa179d0] __crash_kexec at b291ce92
  #2 [8e79eaa17aa0] crash_kexec at b291cf80
  #3 [8e79eaa17ab8] oops_end at b2f6c768
  #4 [8e79eaa17ae0] no_context at b2f5aaa6
  #5 [8e79eaa17b30] __bad_area_nosemaphore at b2f5ab3d
  #6 [8e79eaa17b80] bad_area_nosemaphore at b2f5acae
  #7 [8e79eaa17b90] __do_page_fault at b2f6f6c0
  #8 [8e79eaa17c00] do_page_fault at b2f6f925
  #9 [8e79eaa17c30] page_fault at b2f6b758
 [exception RIP: memset+31]
 RIP: b2b8668f  RSP: 8e79eaa17ce8  RFLAGS: 00010a17
 RAX: bebebebebebebebe  RBX: 8e747bff10c0  RCX: 
060b0020
 RDX:   RSI: 00be  RDI: 
ab807f00
 RBP: 8e79eaa17d10   R8: 8e79eaa14000   R9: 
ab7c8000
 R10: bcba  R11: 01ba  R12: 
8e79ebaa4050
 R13: ab7c8000  R14: 00022600  R15: 
8e8136e04100

 ORIG_RAX:   CS: 0010  SS: 0018
#10 [8e79eaa17ce8] amdgpu_bo_release_notify at c092f2d1
[amdgpu]
#11 [8e79eaa17d18] ttm_bo_release at c08f39dd [amdttm]
#12 [8e79eaa17d58] amdttm_bo_put at c08f3c8c [amdttm]
#13 [8e79eaa17d68] amdttm_bo_vm_close at c08f7ac9 [amdttm]
#14 [8e79eaa17d80] remove_vma at b29ef115
#15 [8e79eaa17da0] exit_mmap at b29f2c64
#16 [8e79eaa17e58] mmput at b28940c7
#17 [8e79eaa17e78] do_exit at b289dc95
#18 [8e79eaa17f10] do_group_exit at b289e4cf
#19 [8e79eaa17f40] sys_exit_group at b289e544
#20 [8e79eaa17f50] system_call_fastpath at b2f74ddb


Well that might be perfectly expected. VRAM is not necessarily CPU
accessible.


As a test,use CPU memset instead of SDMA fill, This is my code:
void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
{
struct amdgpu_bo *abo;
uint64_t num_pages;
struct drm_mm_node *mm_node;
struct amdgpu_device *adev;
void __iomem *kaddr;

if (!amdgpu_bo_is_amdgpu_bo(bo))
    return;

abo = ttm_to_amdgpu_bo(bo);
num_pages = abo->tbo.num_pages;
mm_node = abo->tbo.mem.mm_node;
adev = amdgpu_ttm_adev(abo->tbo.bdev);
kaddr = adev->mman.aper_base_kaddr;

if (abo->kfd_bo)
    amdgpu_amdkfd_unreserve_memory_limit(abo);

if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
    return;

dma_resv_lock(amdkcl_ttm_resvp(bo), NULL);
while (num_pages && mm_node) {
    void *ptr = kaddr + (mm_node->start << PAGE_SHIFT);


That might not work as expected.

aper_base_kaddr can only point to a 256MiB window into VRAM, but VRAM 
itself is usually much larger.


So your memset_io() might end up in nirvana if the BO is allocated 
outside of the window.



memset_io(ptr, AMDGPU_POISON & 0xff, mm_node->size <size;
    ++mm_node;
}
dma_resv_unlock(amdkcl_ttm_resvp(bo));
}





I have used the old version through oversight, so I am sorry for your
trouble.


No, problem. I was just wondering if I was missin

Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-03 Thread Christian König

Hi Qu,

Am 03.04.21 um 07:08 schrieb Qu Huang:

Hi Christian,

On 2021/4/3 0:25, Christian König wrote:

Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:
Before dma_resv_lock(bo->base.resv, NULL) in 
amdgpu_bo_release_notify(),

the bo->base.resv lock may be held by ttm_mem_evict_first(),


That can't happen since when bo_release_notify is called the BO has not
more references and is therefore deleted.

And we never evict a deleted BO, we just wait for it to become idle.


Yes, the bo reference counter return to zero will enter
ttm_bo_release(),but notify bo release (call amdgpu_bo_release_notify())
first happen, and then test if a reservation object's fences have been
signaled, and then mark bo as deleted and remove bo from the LRU list.

When ttm_bo_release() and ttm_mem_evict_first() is concurrent,
the Bo has not been removed from the LRU list and is not marked as
deleted, this will happen.


Not sure on which code base you are, but I don't see how this can happen.

ttm_mem_evict_first() calls ttm_bo_get_unless_zero() and 
ttm_bo_release() is only called when the BO reference count becomes zero.


So ttm_mem_evict_first() will see that this BO is about to be destroyed 
and skips it.




As a test, when we use CPU memset instead of SDMA fill in
amdgpu_bo_release_notify(), the result is page fault:

PID: 5490   TASK: 8e8136e04100  CPU: 4   COMMAND: "gemmPerf"
  #0 [8e79eaa17970] machine_kexec at b2863784
  #1 [8e79eaa179d0] __crash_kexec at b291ce92
  #2 [8e79eaa17aa0] crash_kexec at b291cf80
  #3 [8e79eaa17ab8] oops_end at b2f6c768
  #4 [8e79eaa17ae0] no_context at b2f5aaa6
  #5 [8e79eaa17b30] __bad_area_nosemaphore at b2f5ab3d
  #6 [8e79eaa17b80] bad_area_nosemaphore at b2f5acae
  #7 [8e79eaa17b90] __do_page_fault at b2f6f6c0
  #8 [8e79eaa17c00] do_page_fault at b2f6f925
  #9 [8e79eaa17c30] page_fault at b2f6b758
 [exception RIP: memset+31]
 RIP: b2b8668f  RSP: 8e79eaa17ce8  RFLAGS: 00010a17
 RAX: bebebebebebebebe  RBX: 8e747bff10c0  RCX: 060b0020
 RDX:   RSI: 00be  RDI: ab807f00
 RBP: 8e79eaa17d10   R8: 8e79eaa14000   R9: ab7c8000
 R10: bcba  R11: 01ba  R12: 8e79ebaa4050
 R13: ab7c8000  R14: 00022600  R15: 8e8136e04100
 ORIG_RAX:   CS: 0010  SS: 0018
#10 [8e79eaa17ce8] amdgpu_bo_release_notify at c092f2d1 
[amdgpu]

#11 [8e79eaa17d18] ttm_bo_release at c08f39dd [amdttm]
#12 [8e79eaa17d58] amdttm_bo_put at c08f3c8c [amdttm]
#13 [8e79eaa17d68] amdttm_bo_vm_close at c08f7ac9 [amdttm]
#14 [8e79eaa17d80] remove_vma at b29ef115
#15 [8e79eaa17da0] exit_mmap at b29f2c64
#16 [8e79eaa17e58] mmput at b28940c7
#17 [8e79eaa17e78] do_exit at b289dc95
#18 [8e79eaa17f10] do_group_exit at b289e4cf
#19 [8e79eaa17f40] sys_exit_group at b289e544
#20 [8e79eaa17f50] system_call_fastpath at b2f74ddb


Well that might be perfectly expected. VRAM is not necessarily CPU 
accessible.


Regards,
Christian.



Regards,
Qu.



Regards,
Christian.


and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv lock, and SDMA will get an invalid
address in amdgpu_fill_buffer(), resulting in a VMFAULT
or memory corruption.

To avoid it, we have to hold bo->base.resv lock first, and
check whether the mem.mem_type is TTM_PL_VRAM.

Signed-off-by: Qu Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4b29b82..8018574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct
ttm_buffer_object *bo)
  if (bo->base.resv == >base._resv)
  amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);

-    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
-    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+    if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
  return;

  dma_resv_lock(bo->base.resv, NULL);

+    if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
+    dma_resv_unlock(bo->base.resv);
+    return;
+    }
+
  r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, 
);

  if (!WARN_ON(r)) {
  amdgpu_bo_fence(abo, fence, false);
--
1.8.3.1







Re: [PATCH] drm/amdgpu: Fix a potential sdma invalid access

2021-04-02 Thread Christian König

Hi Qu,

Am 02.04.21 um 05:18 schrieb Qu Huang:

Before dma_resv_lock(bo->base.resv, NULL) in amdgpu_bo_release_notify(),
the bo->base.resv lock may be held by ttm_mem_evict_first(),


That can't happen since when bo_release_notify is called the BO has not 
more references and is therefore deleted.


And we never evict a deleted BO, we just wait for it to become idle.

Regards,
Christian.


and the VRAM mem will be evicted, mem region was replaced
by Gtt mem region. amdgpu_bo_release_notify() will then
hold the bo->base.resv lock, and SDMA will get an invalid
address in amdgpu_fill_buffer(), resulting in a VMFAULT
or memory corruption.

To avoid it, we have to hold bo->base.resv lock first, and
check whether the mem.mem_type is TTM_PL_VRAM.

Signed-off-by: Qu Huang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4b29b82..8018574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1300,12 +1300,16 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
*bo)
if (bo->base.resv == >base._resv)
amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);

-   if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
-   !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+   if (!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
return;

dma_resv_lock(bo->base.resv, NULL);

+   if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node) {
+   dma_resv_unlock(bo->base.resv);
+   return;
+   }
+
r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, );
if (!WARN_ON(r)) {
amdgpu_bo_fence(abo, fence, false);
--
1.8.3.1





Re: [PATCH] drm/ttm: add __user annotation in radeon_ttm_vram_read

2021-04-01 Thread Christian König

Am 24.10.20 um 02:47 schrieb Rasmus Villemoes:

Keep sparse happy by preserving the __user annotation when casting.

Reported-by: kernel test robot 
Signed-off-by: Rasmus Villemoes 


Reviewed-by: Christian König 

Going over old patches and stumbled over that once.

Alex did you missed to pick it up?

Regards,
Christian.


---

kernel test robot has already started spamming me due to 9c5743dff. If
I don't fix those warnings I'll keep getting those emails for
months, so let me do the easy ones.


  drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 36150b7f31a90aa1eece..ecfe88b0a35d8f317712 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -1005,7 +1005,7 @@ static ssize_t radeon_ttm_vram_read(struct file *f, char 
__user *buf,
value = RREG32(RADEON_MM_DATA);
spin_unlock_irqrestore(>mmio_idx_lock, flags);
  
-		r = put_user(value, (uint32_t *)buf);

+   r = put_user(value, (uint32_t __user *)buf);
if (r)
return r;
  




Re: [PATCH] drm/ttm: cleanup coding style a bit

2021-03-31 Thread Christian König

Am 31.03.21 um 15:12 schrieb Bernard Zhao:

Fix sparse warning:
drivers/gpu/drm/ttm/ttm_bo.c:52:1: warning: symbol 'ttm_global_mutex' was not 
declared. Should it be static?
drivers/gpu/drm/ttm/ttm_bo.c:53:10: warning: symbol 'ttm_bo_glob_use_count' was 
not declared. Should it be static?

Signed-off-by: Bernard Zhao 


You are based on an outdated branch, please rebase on top of drm-misc-next.

Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 101a68dc615b..eab21643edfb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -49,8 +49,8 @@ static void ttm_bo_global_kobj_release(struct kobject *kobj);
  /*
   * ttm_global_mutex - protecting the global BO state
   */
-DEFINE_MUTEX(ttm_global_mutex);
-unsigned ttm_bo_glob_use_count;
+static DEFINE_MUTEX(ttm_global_mutex);
+static unsigned int ttm_bo_glob_use_count;
  struct ttm_bo_global ttm_bo_glob;
  EXPORT_SYMBOL(ttm_bo_glob);
  




Re: [PATCH 0/2] ensure alignment on CPU page for bo mapping

2021-03-30 Thread Christian König
Reviewed-by: Christian König  for the entire 
series.


Alex will probably pick them up for the next feature pull request.

Regards,
Christian.

Am 30.03.21 um 17:33 schrieb Xℹ Ruoyao:

In AMDGPU driver, the bo mapping should always align to CPU page or
the page table is corrupted.

The first patch is cherry-picked from Loongson community, which sets a
suitable dev_info.gart_page_size so Mesa will handle the alignment
correctly.

The second patch is added to ensure an ioctl with unaligned parameter to
be rejected -EINVAL, instead of causing page table corruption.

The patches should be applied for drm-next.

Huacai Chen (1):
   drm/amdgpu: Set a suitable dev_info.gart_page_size

Xℹ Ruoyao (1):
   drm/amdgpu: check alignment on CPU page for bo map

  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 8 
  2 files changed, 6 insertions(+), 6 deletions(-)


base-commit: a0c8b193bfe81cc8e9c7c162bb8d777ba12596f0




Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-30 Thread Christian König




Am 30.03.21 um 15:23 schrieb Dan Horák:

On Tue, 30 Mar 2021 21:09:12 +0800
Xi Ruoyao  wrote:


On 2021-03-30 21:02 +0800, Xi Ruoyao wrote:

On 2021-03-30 14:55 +0200, Christian König wrote:

I rather see this as a kernel bug. Can you test if this code fragment
fixes your issue:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 64beb3399604..e1260b517e1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
*data, struct drm_file *filp)
  }
  dev_info->virtual_address_alignment =
max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
  dev_info->pte_fragment_size = (1 <<
adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
-   dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+   dev_info->gart_page_size =
dev_info->virtual_address_alignment;
  dev_info->cu_active_number = adev->gfx.cu_info.number;
  dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
  dev_info->ce_ram_size = adev->gfx.ce_ram_size;

It works.  I've seen it at
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxen0n%2Flinux%2Fcommit%2F84ada72983838bd7ce54bc32f5d34ac5b5aae191data=04%7C01%7Cchristian.koenig%40amd.com%7Cf37fddf20a8847edf67808d8f37ef23c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527074118791321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=DZnmee38NGpiWRMX5LmlxOhxAzBMhAusnAWNnCxXTJ0%3Dreserved=0
before (with a common sub-expression, though :).

Some comment: on an old version of Fedora ported by Loongson, Xorg just hangs
without this commit.  But on the system I built from source, I didn't see any
issue before Linux 5.11.  So I misbelieved that it was something already fixed.

Dan: you can try it on your PPC 64 with non-4K page as well.

yup, looks good here as well, ppc64le (Power9) system with 64KB pages


Mhm, as far as I can say this patch never made it to us.

Can you please send it to the mailing list with me on CC?

Thanks,
Christian.




Dan




Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-30 Thread Christian König

Am 30.03.21 um 15:00 schrieb Dan Horák:

On Tue, 30 Mar 2021 14:55:01 +0200
Christian König  wrote:


Am 30.03.21 um 14:04 schrieb Xi Ruoyao:

On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:

On 2021-03-29 21:36 +0200, Christian König wrote:

Am 29.03.21 um 21:27 schrieb Xi Ruoyao:

Hi Christian,

I don't think there is any constraint implemented to ensure `num_entries %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`:

   /* validate the parameters */
   if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK
   size == 0 || size & AMDGPU_GPU_PAGE_MASK)
   return -EINVAL;

/* snip */

   saddr /= AMDGPU_GPU_PAGE_SIZE;
   eaddr /= AMDGPU_GPU_PAGE_SIZE;

/* snip */

   mapping->start = saddr;
   mapping->last = eaddr;


If we really want to ensure (mapping->last - mapping->start + 1) %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace
"AMDGPU_GPU_PAGE_MASK"
in "validate the parameters" with "PAGE_MASK".

It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of
"AMDGPU_GPU_PAGE_MASK" :(.


Yeah, good point.


I tried it and it broke userspace: Xorg startup fails with EINVAL with
this
change.

Well in theory it is possible that we always fill the GPUVM on a 4k
basis while the native page size of the CPU is larger. Let me double
check the code.

On my platform, there are two issues both causing the VM corruption.  One is
fixed by agd5f/linux@fe001e7.

What is fe001e7? A commit id? If yes then that is to short and I can't
find it.

it's a gitlab shortcut for
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux%2F-%2Fcommit%2Ffe001e70a55d0378328612be1fdc3abfc68b9cccdata=04%7C01%7Cchristian.koenig%40amd.com%7Cd16d123aaa01420ebd0808d8f37bbf2f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637527060812278536%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=5rFVLxSRnfHUGjhoiqx1e6SeROqbg4ZPef%2BxEvgv%2BTg%3Dreserved=0


Ah! Yes I would expect that this patch is fixing something in this use case.

Thanks,
Christian.




Dan

Another is in Mesa from userspace:  it uses
`dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver
expects it to use `dev_info->virtual_address_alignment`.

Mhm, looking at the kernel code I would rather say Mesa is correct and
the kernel driver is broken.

The gart_page_size is limited by the CPU page size, but the
virtual_address_alignment isn't.


If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue
and make virtual_address_alignment = 4K.  Otherwise, we should fortify the
parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK".  Then the
userspace will just get an EINVAL, instead of a slient VM corruption.  And
someone should tell Mesa developers to fix the code in this case.

I rather see this as a kernel bug. Can you test if this code fragment
fixes your issue:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 64beb3399604..e1260b517e1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void
*data, struct drm_file *filp)
      }
      dev_info->virtual_address_alignment =
max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
      dev_info->pte_fragment_size = (1 <<
adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
-   dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+   dev_info->gart_page_size =
dev_info->virtual_address_alignment;
      dev_info->cu_active_number = adev->gfx.cu_info.number;
      dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
      dev_info->ce_ram_size = adev->gfx.ce_ram_size;


Thanks,
Christian.


--
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University





Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-30 Thread Christian König

Am 30.03.21 um 14:04 schrieb Xi Ruoyao:

On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:

On 2021-03-29 21:36 +0200, Christian König wrote:

Am 29.03.21 um 21:27 schrieb Xi Ruoyao:

Hi Christian,

I don't think there is any constraint implemented to ensure `num_entries %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`:

  /* validate the parameters */
  if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK
  size == 0 || size & AMDGPU_GPU_PAGE_MASK)
  return -EINVAL;

/* snip */

  saddr /= AMDGPU_GPU_PAGE_SIZE;
  eaddr /= AMDGPU_GPU_PAGE_SIZE;

/* snip */

  mapping->start = saddr;
  mapping->last = eaddr;


If we really want to ensure (mapping->last - mapping->start + 1) %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace
"AMDGPU_GPU_PAGE_MASK"
in "validate the parameters" with "PAGE_MASK".

It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of
"AMDGPU_GPU_PAGE_MASK" :(.


Yeah, good point.


I tried it and it broke userspace: Xorg startup fails with EINVAL with
this
change.

Well in theory it is possible that we always fill the GPUVM on a 4k
basis while the native page size of the CPU is larger. Let me double
check the code.

On my platform, there are two issues both causing the VM corruption.  One is
fixed by agd5f/linux@fe001e7.


What is fe001e7? A commit id? If yes then that is to short and I can't 
find it.



   Another is in Mesa from userspace:  it uses
`dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver
expects it to use `dev_info->virtual_address_alignment`.


Mhm, looking at the kernel code I would rather say Mesa is correct and 
the kernel driver is broken.


The gart_page_size is limited by the CPU page size, but the 
virtual_address_alignment isn't.



If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue
and make virtual_address_alignment = 4K.  Otherwise, we should fortify the
parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK".  Then the
userspace will just get an EINVAL, instead of a slient VM corruption.  And
someone should tell Mesa developers to fix the code in this case.


I rather see this as a kernel bug. Can you test if this code fragment 
fixes your issue:


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index 64beb3399604..e1260b517e1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file *filp)

    }
    dev_info->virtual_address_alignment = 
max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
    dev_info->pte_fragment_size = (1 << 
adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;

-   dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+   dev_info->gart_page_size = 
dev_info->virtual_address_alignment;

    dev_info->cu_active_number = adev->gfx.cu_info.number;
    dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
    dev_info->ce_ram_size = adev->gfx.ce_ram_size;


Thanks,
Christian.


--
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University





Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-29 Thread Christian König

Am 29.03.21 um 21:27 schrieb Xi Ruoyao:

Hi Christian,

I don't think there is any constraint implemented to ensure `num_entries %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`:

 /* validate the parameters */
 if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK ||
 size == 0 || size & AMDGPU_GPU_PAGE_MASK)
 return -EINVAL;

/* snip */

 saddr /= AMDGPU_GPU_PAGE_SIZE;
 eaddr /= AMDGPU_GPU_PAGE_SIZE;

/* snip */

 mapping->start = saddr;
 mapping->last = eaddr;


If we really want to ensure (mapping->last - mapping->start + 1) %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK"
in "validate the parameters" with "PAGE_MASK".


Yeah, good point.


I tried it and it broke userspace: Xorg startup fails with EINVAL with this
change.


Well in theory it is possible that we always fill the GPUVM on a 4k 
basis while the native page size of the CPU is larger. Let me double 
check the code.


BTW: What code base are you based on? The code your post here is quite 
outdated.


Christian.



On 2021-03-30 02:30 +0800, Xi Ruoyao wrote:

On 2021-03-30 02:21 +0800, Xi Ruoyao wrote:

On 2021-03-29 20:10 +0200, Christian König wrote:

You need to identify the root cause of this, most likely start or last
are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.

I printk'ed the value of start & last, they are all a multiple of 4
(AMDGPU_GPU_PAGES_IN_CPU_PAGE).

However... `num_entries = last - start + 1` so it became some irrational
thing...  Either this line is wrong, or someone called
amdgpu_vm_bo_update_mapping with [start, last) instead of [start, last], which
is unexpected.

I added BUG_ON(num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE != 0), get:


Mar 30 02:28:27 xry111-A1901 kernel: []
amdgpu_vm_bo_update_mapping.constprop.0+0x218/0xae8
Mar 30 02:28:27 xry111-A1901 kernel: []
amdgpu_vm_bo_update+0x270/0x4c0
Mar 30 02:28:27 xry111-A1901 kernel: []
amdgpu_gem_va_ioctl+0x40c/0x430
Mar 30 02:28:27 xry111-A1901 kernel: []
drm_ioctl_kernel+0xcc/0x120
Mar 30 02:28:27 xry111-A1901 kernel: []
drm_ioctl+0x220/0x408
Mar 30 02:28:27 xry111-A1901 kernel: []
amdgpu_drm_ioctl+0x58/0x98
Mar 30 02:28:27 xry111-A1901 kernel: [] sys_ioctl+0xcc/0xe8
Mar 30 02:28:27 xry111-A1901 kernel: []
syscall_common+0x34/0x58


BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2")
Reported-by: Xi Ruoyao 
Reported-by: Dan Horák 
Cc: sta...@vger.kernel.org
Signed-off-by: Xi Ruoyao 
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ad91c0c3c423..cee0cc9c8085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct
amdgpu_device *adev,
  }
  start = tmp;

-   } while (unlikely(start != last + 1));

+   } while (unlikely(start < last + 1));

  r = vm->update_funcs->commit(, fence);



base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3




Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-29 Thread Christian König

Am 29.03.21 um 20:08 schrieb Xi Ruoyao:

On 2021-03-29 20:04 +0200, Christian König wrote:

Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao:

If the initial value of `num_entires` (calculated at line 1654) is not
an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a
value greater than the initial value will be assigned to it.  That causes
`start > last + 1` after line 1708.  Then in the next iteration an
underflow happens at line 1654.  It causes message

  *ERROR* Couldn't update BO_VA (-12)

printed in kernel log, and GPU hanging.

Fortify the criteria of the loop to fix this issue.

NAK the value of num_entries must always be a multiple of
AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.

How do you trigger that?

Simply run "OpenGL area" from gtk3-demo (which just renders a triangle with GL)
under Xorg, on MIPS64.  See the BugLink.


You need to identify the root cause of this, most likely start or last 
are not a multiple of AMDGPU_GPU_PAGES_IN_CPU_PAGE.


Christian.




Christian.


BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2")
Reported-by: Xi Ruoyao 
Reported-by: Dan Horák 
Cc: sta...@vger.kernel.org
Signed-off-by: Xi Ruoyao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ad91c0c3c423..cee0cc9c8085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct
amdgpu_device *adev,
 }
 start = tmp;
   
-   } while (unlikely(start != last + 1));

+   } while (unlikely(start < last + 1));
   
 r = vm->update_funcs->commit(, fence);
   


base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3




Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

2021-03-29 Thread Christian König

Am 29.03.21 um 19:53 schrieb Xℹ Ruoyao:

If the initial value of `num_entires` (calculated at line 1654) is not
an integral multiple of `AMDGPU_GPU_PAGES_IN_CPU_PAGE`, in line 1681 a
value greater than the initial value will be assigned to it.  That causes
`start > last + 1` after line 1708.  Then in the next iteration an
underflow happens at line 1654.  It causes message

 *ERROR* Couldn't update BO_VA (-12)

printed in kernel log, and GPU hanging.

Fortify the criteria of the loop to fix this issue.


NAK the value of num_entries must always be a multiple of 
AMDGPU_GPU_PAGES_IN_CPU_PAGE or otherwise we corrupt the page tables.


How do you trigger that?

Christian.



BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549
Fixes: a39f2a8d7066 ("drm/amdgpu: nuke amdgpu_vm_bo_split_mapping v2")
Reported-by: Xi Ruoyao 
Reported-by: Dan Horák 
Cc: sta...@vger.kernel.org
Signed-off-by: Xi Ruoyao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ad91c0c3c423..cee0cc9c8085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1707,7 +1707,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
}
start = tmp;
  
-	} while (unlikely(start != last + 1));

+   } while (unlikely(start < last + 1));
  
  	r = vm->update_funcs->commit(, fence);
  


base-commit: a5e13c6df0e41702d2b2c77c8ad41677ebb065b3




Re: drm/ttm: switch to per device LRU lock

2021-03-25 Thread Christian König

Thanks! Just a copy issue.

Patch to fix this is on the mailing list.

Christian.

Am 25.03.21 um 16:00 schrieb Colin Ian King:

Hi,

Static analysis with Coverity in linux-next has detected an issue in
drivers/gpu/drm/ttm/ttm_bo.c with the follow commit:

commit a1f091f8ef2b680a5184db065527612247cb4cae
Author: Christian König 
Date:   Tue Oct 6 17:26:42 2020 +0200

 drm/ttm: switch to per device LRU lock

 Instead of having a global lock for potentially less contention.


The analysis is as follows:

617 int ttm_mem_evict_first(struct ttm_device *bdev,
618struct ttm_resource_manager *man,
619const struct ttm_place *place,
620struct ttm_operation_ctx *ctx,
621struct ww_acquire_ctx *ticket)
622 {
1. assign_zero: Assigning: bo = NULL.

623struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
624bool locked = false;
625unsigned i;
626int ret;
627

Explicit null dereferenced (FORWARD_NULL)2. var_deref_op:
Dereferencing null pointer bo.

628spin_lock(>bdev->lru_lock);
629for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

The spin_lock on bo is dereferencing a null bo pointer.

Colin




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König

Am 25.03.21 um 14:33 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 02:26:50PM +0100, Christian König wrote:

Am 25.03.21 um 14:17 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 02:05:14PM +0100, Christian König wrote:

Am 25.03.21 um 13:42 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 01:09:14PM +0100, Christian König wrote:

Am 25.03.21 um 13:01 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 12:53:15PM +0100, Thomas Hellström (Intel) wrote:


Nope. The point here was that in this case, to make sure mmap uses the
correct VA to give us a reasonable chance of alignement, the driver might
need to be aware of and do trickery with the huge page-table-entry sizes
anyway, although I think in most cases a standard helper for this can be
supplied.

Of course the driver needs some way to influence the VA mmap uses,
gernally it should align to the natural page size of the device

Well a mmap() needs to be aligned to the page size of the CPU, but not
necessarily to the one of the device.

So I'm pretty sure the device driver should not be involved in any way the
choosing of the VA for the CPU mapping.

No, if the device wants to use huge pages it must influence the mmap
VA or it can't form huge pgaes.

No, that's the job of the core MM and not of the individual driver.

The core mm doesn't know the page size of the device, only which of
several page levels the arch supports. The device must be involevd
here.

Why? See you can have a device which has for example 256KiB pages, but it
should perfectly work that the CPU mapping is aligned to only 4KiB.

The goal is to optimize large page size usage in the page tables.

There are three critera that impact this:
  1) The possible CPU page table sizes
  2) The useful contiguity the device can create in its iomemory
  3) The VA's alignment, as this sets an upper bound on 1 and 2

If a device has 256k pages and the arch supports 2M and 4k then the VA
should align to somewhere between 4k and 256k. The ideal alignment
would be to optimize PTE usage when stuffing 256k blocks by fully
populating PTEs and depends on the arch's # of PTE's per page.


Ah! So you want to also avoid that we only halve populate a PTEs as 
well! That rather nifty.


But you don't need the device page size for this. Just looking at the 
size of the mapping should be enough.


In other words we would align the VA so that it tries to avoid crossing 
page table boundaries.


But to be honest I'm really wondering why the heck we don't already do 
this in vm_unmapped_area(). That should be beneficial for basically 
every slightly larger mapping.


Christian.



If a device has 256k pages and the arch supports 256k pages then the
VA should align to 256k.

The device should never be touching any of this, it should simply
inform what its operating page size is and the MM should use that to
align the VA.

Jason




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König

Am 25.03.21 um 14:17 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 02:05:14PM +0100, Christian König wrote:


Am 25.03.21 um 13:42 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 01:09:14PM +0100, Christian König wrote:

Am 25.03.21 um 13:01 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 12:53:15PM +0100, Thomas Hellström (Intel) wrote:


Nope. The point here was that in this case, to make sure mmap uses the
correct VA to give us a reasonable chance of alignement, the driver might
need to be aware of and do trickery with the huge page-table-entry sizes
anyway, although I think in most cases a standard helper for this can be
supplied.

Of course the driver needs some way to influence the VA mmap uses,
gernally it should align to the natural page size of the device

Well a mmap() needs to be aligned to the page size of the CPU, but not
necessarily to the one of the device.

So I'm pretty sure the device driver should not be involved in any way the
choosing of the VA for the CPU mapping.

No, if the device wants to use huge pages it must influence the mmap
VA or it can't form huge pgaes.

No, that's the job of the core MM and not of the individual driver.

The core mm doesn't know the page size of the device, only which of
several page levels the arch supports. The device must be involevd
here.


Why? See you can have a device which has for example 256KiB pages, but 
it should perfectly work that the CPU mapping is aligned to only 4KiB.


As long as you don't do things like shared virtual memory between device 
and CPU the VA addresses used on the CPU should be completely irrelevant 
for the device.


Regards,
Christian.



Jason




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König




Am 25.03.21 um 13:42 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 01:09:14PM +0100, Christian König wrote:

Am 25.03.21 um 13:01 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 12:53:15PM +0100, Thomas Hellström (Intel) wrote:


Nope. The point here was that in this case, to make sure mmap uses the
correct VA to give us a reasonable chance of alignement, the driver might
need to be aware of and do trickery with the huge page-table-entry sizes
anyway, although I think in most cases a standard helper for this can be
supplied.

Of course the driver needs some way to influence the VA mmap uses,
gernally it should align to the natural page size of the device

Well a mmap() needs to be aligned to the page size of the CPU, but not
necessarily to the one of the device.

So I'm pretty sure the device driver should not be involved in any way the
choosing of the VA for the CPU mapping.

No, if the device wants to use huge pages it must influence the mmap
VA or it can't form huge pgaes.


No, that's the job of the core MM and not of the individual driver.

In other words current->mm->get_unmapped_area should already return a 
properly aligned VA.


Messing with that inside file->f_op->get_unmapped_area is utterly 
nonsense as far as I can see.


It happens to be this way currently, but that is not even remotely good 
design.


Christian.



It is the same reason why mmap returns 2M stuff these days to make THP
possible

Jason




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König




Am 25.03.21 um 13:36 schrieb Thomas Hellström (Intel):


On 3/25/21 1:09 PM, Christian König wrote:

Am 25.03.21 um 13:01 schrieb Jason Gunthorpe:
On Thu, Mar 25, 2021 at 12:53:15PM +0100, Thomas Hellström (Intel) 
wrote:



Nope. The point here was that in this case, to make sure mmap uses the
correct VA to give us a reasonable chance of alignement, the driver 
might
need to be aware of and do trickery with the huge page-table-entry 
sizes
anyway, although I think in most cases a standard helper for this 
can be

supplied.

Of course the driver needs some way to influence the VA mmap uses,
gernally it should align to the natural page size of the device


Well a mmap() needs to be aligned to the page size of the CPU, but 
not necessarily to the one of the device.


So I'm pretty sure the device driver should not be involved in any 
way the choosing of the VA for the CPU mapping.


Christian.

We've had this discussion before and at that time I managed to 
convince you by pointing to the shmem helper for this, 
shmem_get_umapped_area().


No, you didn't convinced me. I was just surprised that this is something 
under driver control.




Basically there are two ways to do this. Either use a standard helper 
similar to shmem's, and then the driver needs to align physical 
(device) huge page boundaries to address space offset huge page 
boundaries. If you don't do that you can just as well use a custom 
function that adjusts for you not doing that 
(drm_get_unmapped_area()). Both require driver knowledge of the size 
of huge pages.


And once more, at least for GPU drivers that looks like the totally 
wrong approach to me.


Aligning the VMA so that huge page allocations become possible is the 
job of the MM subsystem and not that of the drivers.




Without a function to adjust, mmap will use it's default (16 byte?) 
alignment and chance of alignment becomes very small.


Well it's 4KiB at least.

Regards,
Christian.



/Thomas




Jason




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König

Am 25.03.21 um 13:01 schrieb Jason Gunthorpe:

On Thu, Mar 25, 2021 at 12:53:15PM +0100, Thomas Hellström (Intel) wrote:


Nope. The point here was that in this case, to make sure mmap uses the
correct VA to give us a reasonable chance of alignement, the driver might
need to be aware of and do trickery with the huge page-table-entry sizes
anyway, although I think in most cases a standard helper for this can be
supplied.

Of course the driver needs some way to influence the VA mmap uses,
gernally it should align to the natural page size of the device


Well a mmap() needs to be aligned to the page size of the CPU, but not 
necessarily to the one of the device.


So I'm pretty sure the device driver should not be involved in any way 
the choosing of the VA for the CPU mapping.


Christian.



Jason




Re: [PATCH 5.11 073/120] drm/ttm: Warn on pinning without holding a reference

2021-03-25 Thread Christian König

Am 25.03.21 um 09:50 schrieb Greg Kroah-Hartman:

On Thu, Mar 25, 2021 at 09:14:59AM +0100, Christian König wrote:

Hi Greg,

sorry just realized this after users started to complain. This patch
shouldn't been backported to 5.11 in the first place.

The warning itself is a good idea, but we also have patch for drivers and
TTM in the pipeline for 5.12 so that the warning isn't triggered any more.

Without backporting all of that we now get a rain of warnings in 5.11.9.

My suggestion is to revert this patch from the 5.11 branch.

Thanks, will go do so right now and push out a new 5.11 release with
that in it to keep the noise down for you.


Thanks a lot for that! I got a bit swamped this morning because of mails 
and bug reports.


Christian.



greg k-h




Re: WARNING: AMDGPU DRM warning in 5.11.9

2021-03-25 Thread Christian König

Hi,

Am 25.03.21 um 09:17 schrieb Oleksandr Natalenko:

Hello.

On Thu, Mar 25, 2021 at 07:57:33AM +0200, Ilkka Prusi wrote:

On 24.3.2021 16.16, Chris Rankin wrote:

Hi,

Theee warnings ares not present in my dmesg log from 5.11.8:

[   43.390159] [ cut here ]
[   43.393574] WARNING: CPU: 2 PID: 1268 at
drivers/gpu/drm/ttm/ttm_bo.c:517 ttm_bo_release+0x172/0x282 [ttm]
[   43.401940] Modules linked in: nf_nat_ftp nf_conntrack_ftp cfg80211

Changing WARN_ON to WARN_ON_ONCE in drivers/gpu/drm/ttm/ttm_bo.c
ttm_bo_release() reduces the flood of messages into single splat.

This warning appears to come from 57fcd550eb15bce ("drm/ttm: Warn on pinning
without holding a reference)" and reverting it might be one choice.



There are others, but I am assuming there is a common cause here.

Cheers,
Chris


diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a76eb2c14e8c..50b53355b265 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -514,7 +514,7 @@ static void ttm_bo_release(struct kref *kref)
  * shrinkers, now that they are queued for
  * destruction.
  */
-   if (WARN_ON(bo->pin_count)) {
+   if (WARN_ON_ONCE(bo->pin_count)) {
 bo->pin_count = 0;
 ttm_bo_del_from_lru(bo);
 ttm_bo_add_mem_to_lru(bo, >mem);



--
  - Ilkka


WARN_ON_ONCE() will just hide the underlying problem. Do we know why
this happens at all?


The patch was incorrectly back ported to 5.11 without also porting the 
driver changes to not trigger this warning back as well.


We are probably going to revert it for 5.11.10.

Regards,
Christian.



Same for me, BTW, with v5.11.9:

```
[~]> lspci | grep VGA
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa 
PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)

[ 3676.033140] [ cut here ]
[ 3676.033153] WARNING: CPU: 7 PID: 1318 at drivers/gpu/drm/ttm/ttm_bo.c:517 
ttm_bo_release+0x375/0x500 [ttm]
…
[ 3676.033340] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 
3302 03/05/2021
…
[ 3676.033469] Call Trace:
[ 3676.033473]  ttm_bo_move_accel_cleanup+0x1ab/0x3a0 [ttm]
[ 3676.033478]  amdgpu_bo_move+0x334/0x860 [amdgpu]
[ 3676.033580]  ttm_bo_validate+0x1f1/0x2d0 [ttm]
[ 3676.033585]  amdgpu_cs_bo_validate+0x9b/0x1c0 [amdgpu]
[ 3676.033665]  amdgpu_cs_list_validate+0x115/0x150 [amdgpu]
[ 3676.033743]  amdgpu_cs_ioctl+0x873/0x20a0 [amdgpu]
[ 3676.033960]  drm_ioctl_kernel+0xb8/0x140 [drm]
[ 3676.033977]  drm_ioctl+0x222/0x3c0 [drm]
[ 3676.034071]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 3676.034145]  __x64_sys_ioctl+0x83/0xb0
[ 3676.034149]  do_syscall_64+0x33/0x40
…
[ 3676.034171] ---[ end trace 66e9865b027112f3 ]---
```

Thanks.





Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König

Am 25.03.21 um 08:48 schrieb Thomas Hellström (Intel):


On 3/25/21 12:14 AM, Jason Gunthorpe wrote:
On Wed, Mar 24, 2021 at 09:07:53PM +0100, Thomas Hellström (Intel) 
wrote:

On 3/24/21 7:31 PM, Christian König wrote:


Am 24.03.21 um 17:38 schrieb Jason Gunthorpe:

On Wed, Mar 24, 2021 at 04:50:14PM +0100, Thomas Hellström (Intel)
wrote:

On 3/24/21 2:48 PM, Jason Gunthorpe wrote:

On Wed, Mar 24, 2021 at 02:35:38PM +0100, Thomas Hellström
(Intel) wrote:


In an ideal world the creation/destruction of page
table levels would
by dynamic at this point, like THP.

Hmm, but I'm not sure what problem we're trying to solve
by changing the
interface in this way?
We are trying to make a sensible driver API to deal with huge 
pages.

Currently if the core vm requests a huge pud, we give it
one, and if we
can't or don't want to (because of dirty-tracking, for
example, which is
always done on 4K page-level) we just return
VM_FAULT_FALLBACK, and the
fault is retried at a lower level.

Well, my thought would be to move the pte related stuff into
vmf_insert_range instead of recursing back via VM_FAULT_FALLBACK.

I don't know if the locking works out, but it feels cleaner that 
the

driver tells the vmf how big a page it can stuff in, not the vm
telling the driver to stuff in a certain size page which it 
might not

want to do.

Some devices want to work on a in-between page size like 64k so 
they
can't form 2M pages but they can stuff 64k of 4K pages in a 
batch on

every fault.
Hmm, yes, but we would in that case be limited anyway to insert 
ranges

smaller than and equal to the fault size to avoid extensive and
possibly
unnecessary checks for contigous memory.

Why? The insert function is walking the page tables, it just updates
things as they are. It learns the arragement for free while doing the
walk.

The device has to always provide consistent data, if it overlaps into
pages that are already populated that is fine so long as it isn't
changing their addresses.


And then if we can't support the full fault size, we'd need to
either presume a size and alignment of the next level or search for
contigous memory in both directions around the fault address,
perhaps unnecessarily as well.

You don't really need to care about levels, the device should be
faulting in the largest memory regions it can within its efficiency.

If it works on 4M pages then it should be faulting 4M pages. The page
size of the underlying CPU doesn't really matter much other than some
tuning to impact how the device's allocator works.
Yes, but then we'd be adding a lot of complexity into this function 
that is
already provided by the current interface for DAX, for little or no 
gain, at
least in the drm/ttm setting. Please think of the following 
situation: You
get a fault, you do an extensive time-consuming scan of your VRAM 
buffer
object into which the fault goes and determine you can fault 1GB. 
Now you

hand it to vmf_insert_range() and because the user-space address is
misaligned, or already partly populated because of a previous 
eviction, you
can only fault single pages, and you end up faulting a full GB of 
single

pages perhaps for a one-time small update.

Why would "you can only fault single pages" ever be true? If you have
1GB of pages then the vmf_insert_range should allocate enough page
table entries to consume it, regardless of alignment.


Ah yes, What I meant was you can only insert PTE size entries, either 
because of misalignment or because the page-table is alredy 
pre-populated with pmd size page directories, which you can't remove 
with only the read side of the mmap lock held.


Please explain that further. Why do we need the mmap lock to insert PMDs 
but not when insert PTEs?



And why shouldn't DAX switch to this kind of interface anyhow? It is
basically exactly the same problem. The underlying filesystem block
size is *not* necessarily aligned to the CPU page table sizes and DAX
would benefit from better handling of this mismatch.


First, I think we must sort out what "better handling" means. This is 
my takeout of the discussion so far:


Claimed Pros: of vmf_insert_range()
* We get an interface that doesn't require knowledge of CPU page table 
entry level sizes.
* We get the best efficiency when we look at what the GPU driver 
provides. (I disagree on this one).


Claimed Cons:
* A new implementation that may get complicated particularly if it 
involves modifying all of the DAX code
* The driver would have to know about those sizes anyway to get 
alignment right (Applies to DRM, because we mmap buffer objects, not 
physical address ranges. But not to DAX AFAICT),


I don't think so. We could just align all buffers to their next power of 
two in size. Since we have plenty of offset space that shouldn't matter 
much.


Apart from that I still don't fully get why we need this in the first place.

* We loose efficiency, because we are prepared to spend an extra 
effort for alignment- and conti

Re: [PATCH 5.11 073/120] drm/ttm: Warn on pinning without holding a reference

2021-03-25 Thread Christian König

Hi Greg,

sorry just realized this after users started to complain. This patch 
shouldn't been backported to 5.11 in the first place.


The warning itself is a good idea, but we also have patch for drivers 
and TTM in the pipeline for 5.12 so that the warning isn't triggered any 
more.


Without backporting all of that we now get a rain of warnings in 5.11.9.

My suggestion is to revert this patch from the 5.11 branch.

Thanks,
Christian.

Am 22.03.21 um 13:27 schrieb Greg Kroah-Hartman:

From: Daniel Vetter 

[ Upstream commit 57fcd550eb15bce14a7154736379dfd4ed60ae81 ]

Not technically a problem for ttm, but very likely a driver bug and
pretty big time confusing for reviewing code.

So warn about it, both at cleanup time (so we catch these for sure)
and at pin/unpin time (so we know who's the culprit).

Reviewed-by: Huang Rui 
Reviewed-by: Christian König 
Signed-off-by: Daniel Vetter 
Cc: Christian Koenig 
Cc: Huang Rui 
Link: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20201028113120.3641237-1-daniel.vetter%40ffwll.chdata=04%7C01%7Cchristian.koenig%40amd.com%7C090c217ffc0f4823bbe508d8ed2ec6eb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637520132480960479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=SJqFJ7QthSG4R%2B918EqjKliGwi1DJUORh6DtpHGTtn8%3Dreserved=0
Signed-off-by: Sasha Levin 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
  include/drm/ttm/ttm_bo_api.h | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 22073e77fdf9..a76eb2c14e8c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -514,7 +514,7 @@ static void ttm_bo_release(struct kref *kref)
 * shrinkers, now that they are queued for
 * destruction.
 */
-   if (bo->pin_count) {
+   if (WARN_ON(bo->pin_count)) {
bo->pin_count = 0;
ttm_bo_del_from_lru(bo);
ttm_bo_add_mem_to_lru(bo, >mem);
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 2564e66e67d7..79b9367e0ffd 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -600,6 +600,7 @@ static inline bool ttm_bo_uses_embedded_gem_object(struct 
ttm_buffer_object *bo)
  static inline void ttm_bo_pin(struct ttm_buffer_object *bo)
  {
dma_resv_assert_held(bo->base.resv);
+   WARN_ON_ONCE(!kref_read(>kref));
++bo->pin_count;
  }
  
@@ -613,6 +614,7 @@ static inline void ttm_bo_unpin(struct ttm_buffer_object *bo)

  {
dma_resv_assert_held(bo->base.resv);
WARN_ON_ONCE(!bo->pin_count);
+   WARN_ON_ONCE(!kref_read(>kref));
--bo->pin_count;
  }
  




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-25 Thread Christian König

Am 25.03.21 um 00:14 schrieb Jason Gunthorpe:

On Wed, Mar 24, 2021 at 09:07:53PM +0100, Thomas Hellström (Intel) wrote:

On 3/24/21 7:31 PM, Christian König wrote:


Am 24.03.21 um 17:38 schrieb Jason Gunthorpe:

On Wed, Mar 24, 2021 at 04:50:14PM +0100, Thomas Hellström (Intel)
wrote:

On 3/24/21 2:48 PM, Jason Gunthorpe wrote:

On Wed, Mar 24, 2021 at 02:35:38PM +0100, Thomas Hellström
(Intel) wrote:


In an ideal world the creation/destruction of page
table levels would
by dynamic at this point, like THP.

Hmm, but I'm not sure what problem we're trying to solve
by changing the
interface in this way?

We are trying to make a sensible driver API to deal with huge pages.

Currently if the core vm requests a huge pud, we give it
one, and if we
can't or don't want to (because of dirty-tracking, for
example, which is
always done on 4K page-level) we just return
VM_FAULT_FALLBACK, and the
fault is retried at a lower level.

Well, my thought would be to move the pte related stuff into
vmf_insert_range instead of recursing back via VM_FAULT_FALLBACK.

I don't know if the locking works out, but it feels cleaner that the
driver tells the vmf how big a page it can stuff in, not the vm
telling the driver to stuff in a certain size page which it might not
want to do.

Some devices want to work on a in-between page size like 64k so they
can't form 2M pages but they can stuff 64k of 4K pages in a batch on
every fault.

Hmm, yes, but we would in that case be limited anyway to insert ranges
smaller than and equal to the fault size to avoid extensive and
possibly
unnecessary checks for contigous memory.

Why? The insert function is walking the page tables, it just updates
things as they are. It learns the arragement for free while doing the
walk.

The device has to always provide consistent data, if it overlaps into
pages that are already populated that is fine so long as it isn't
changing their addresses.


And then if we can't support the full fault size, we'd need to
either presume a size and alignment of the next level or search for
contigous memory in both directions around the fault address,
perhaps unnecessarily as well.

You don't really need to care about levels, the device should be
faulting in the largest memory regions it can within its efficiency.

If it works on 4M pages then it should be faulting 4M pages. The page
size of the underlying CPU doesn't really matter much other than some
tuning to impact how the device's allocator works.

Yes, but then we'd be adding a lot of complexity into this function that is
already provided by the current interface for DAX, for little or no gain, at
least in the drm/ttm setting. Please think of the following situation: You
get a fault, you do an extensive time-consuming scan of your VRAM buffer
object into which the fault goes and determine you can fault 1GB. Now you
hand it to vmf_insert_range() and because the user-space address is
misaligned, or already partly populated because of a previous eviction, you
can only fault single pages, and you end up faulting a full GB of single
pages perhaps for a one-time small update.

Why would "you can only fault single pages" ever be true? If you have
1GB of pages then the vmf_insert_range should allocate enough page
table entries to consume it, regardless of alignment.


Completely agree with Jason. Filling in the CPU page tables is 
relatively cheap if you fill in a large continuous range.


In other words filling in 1GiB of a linear range is *much* less overhead 
than filling in 1<<18 4KiB faults.


I would say that this is always preferable even if the CPU only wants to 
update a single byte.



And why shouldn't DAX switch to this kind of interface anyhow? It is
basically exactly the same problem. The underlying filesystem block
size is *not* necessarily aligned to the CPU page table sizes and DAX
would benefit from better handling of this mismatch.


On top of this, unless we want to do the walk trying increasingly smaller
sizes of vmf_insert_xxx(), we'd have to use apply_to_page_range() and teach
it about transhuge page table entries, because pagewalk.c can't be used (It
can't populate page tables). That also means apply_to_page_range() needs to
be complicated with page table locks since transhuge pages aren't stable and
can be zapped and refaulted under us while we do the walk.

I didn't say it would be simple :) But we also need to stop hacking
around the sides of all this huge page stuff and come up with sensible
APIs that drivers can actually implement correctly. Exposing drivers
to specific kinds of page levels really feels like the wrong level of
abstraction.

Once we start doing this we should do it everywhere, the io_remap_pfn
stuff should be able to create huge special IO pages as well, for
instance.


Oh, yes please!

We easily have 16GiB of VRAM which is linear mapped into the kernel 
space for each GPU instance.


Doing that with 1GiB mapping instead of 4KiB would be quite a win.

Re

Re: [RFC PATCH v2 04/11] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset()

2021-03-24 Thread Christian König

Am 24.03.21 um 18:21 schrieb Jason Gunthorpe:

On Mon, Mar 15, 2021 at 10:27:08AM -0600, Logan Gunthorpe wrote:


In this case the WARN_ON is just to guard against misuse of the
function. It should never happen unless a developer changes the code in
a way that is incorrect. So I think that's the correct use of WARN_ON.
Though I might change it to WARN and return, that seems safer.

Right, WARN_ON and return is the right pattern for an assertion that
must never happen:

   if (WARN_ON(foo))
   return -1

Linus wants assertions like this to be able to recover. People runing
the 'panic on warn' mode want the kernel to stop if it detects an
internal malfunction.


The only justification I can see for a "panic on warn" is to prevent 
further data loss or warn early about a crash.


We only use a BUG_ON() when the alternative would be to corrupt something.

Christian.



Jason




Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

2021-03-24 Thread Christian König




Am 24.03.21 um 17:38 schrieb Jason Gunthorpe:

On Wed, Mar 24, 2021 at 04:50:14PM +0100, Thomas Hellström (Intel) wrote:

On 3/24/21 2:48 PM, Jason Gunthorpe wrote:

On Wed, Mar 24, 2021 at 02:35:38PM +0100, Thomas Hellström (Intel) wrote:


In an ideal world the creation/destruction of page table levels would
by dynamic at this point, like THP.

Hmm, but I'm not sure what problem we're trying to solve by changing the
interface in this way?

We are trying to make a sensible driver API to deal with huge pages.

Currently if the core vm requests a huge pud, we give it one, and if we
can't or don't want to (because of dirty-tracking, for example, which is
always done on 4K page-level) we just return VM_FAULT_FALLBACK, and the
fault is retried at a lower level.

Well, my thought would be to move the pte related stuff into
vmf_insert_range instead of recursing back via VM_FAULT_FALLBACK.

I don't know if the locking works out, but it feels cleaner that the
driver tells the vmf how big a page it can stuff in, not the vm
telling the driver to stuff in a certain size page which it might not
want to do.

Some devices want to work on a in-between page size like 64k so they
can't form 2M pages but they can stuff 64k of 4K pages in a batch on
every fault.

Hmm, yes, but we would in that case be limited anyway to insert ranges
smaller than and equal to the fault size to avoid extensive and possibly
unnecessary checks for contigous memory.

Why? The insert function is walking the page tables, it just updates
things as they are. It learns the arragement for free while doing the
walk.

The device has to always provide consistent data, if it overlaps into
pages that are already populated that is fine so long as it isn't
changing their addresses.


And then if we can't support the full fault size, we'd need to
either presume a size and alignment of the next level or search for
contigous memory in both directions around the fault address,
perhaps unnecessarily as well.

You don't really need to care about levels, the device should be
faulting in the largest memory regions it can within its efficiency.

If it works on 4M pages then it should be faulting 4M pages. The page
size of the underlying CPU doesn't really matter much other than some
tuning to impact how the device's allocator works.


I agree with Jason here.

We get the best efficiency when we look at the what the GPU driver 
provides and make sure that we handle one GPU page at once instead of 
looking to much into what the CPU is doing with it's page tables.


At least one AMD GPUs the GPU page size can be anything between 4KiB and 
2GiB and if we will in a 2GiB chunk at once this can in theory be 
handled by just two giant page table entries on the CPU side.


On the other hand I'm not sure how filling in the CPU page tables work 
in detail.


Christian.



Jason




Re: [RFC PATCH 2/2] mm,drm/ttm: Use VM_PFNMAP for TTM vmas

2021-03-23 Thread Christian König




Am 22.03.21 um 09:13 schrieb Thomas Hellström (Intel):

Hi!

On 3/22/21 8:47 AM, Christian König wrote:

Am 21.03.21 um 19:45 schrieb Thomas Hellström (Intel):

To block fast gup we need to make sure TTM ptes are always special.
With MIXEDMAP we, on architectures that don't support pte_special,
insert normal ptes, but OTOH on those architectures, fast is not
supported.
At the same time, the function documentation to vm_normal_page() 
suggests

that ptes pointing to system memory pages of MIXEDMAP vmas are always
normal, but that doesn't seem consistent with what's implemented in
vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
needed.

But to make sure and to avoid also normal (non-fast) gup, make all
TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
anymore so make is_cow_mapping() available and use it to reject
COW mappigs at mmap time.


I would separate the disallowing of COW mapping from the PFN change. 
I'm pretty sure that COW mappings never worked on TTM BOs in the 
first place.


COW doesn't work with PFNMAP together with non-linear maps, so as a 
consequence from moving from MIXEDMAP to PFNMAP we must disallow COW, 
so it seems logical to me to do it in one patch.


And working COW was one of the tests I used for huge PMDs/PUDs, so it 
has indeed been working, but I can't think of any relevant use-cases.


Ok, going to keep that in mind. I was assuming COW didn't worked before 
on TTM pages.



Did you, BTW, have a chance to test this with WC mappings?


I'm going to give this a full piglit round, but currently I'm busy with 
internal testing.


Thanks,
Christian.



Thanks,
/Thomas





But either way this patch is Reviewed-by: Christian König 
.


Thanks,
Christian.



There was previously a comment in the code that WC mappings together
with x86 PAT + PFNMAP was bad for performance. However from looking at
vmf_insert_mixed() it looks like in the current code PFNMAP and 
MIXEDMAP

are handled the same for architectures that support pte_special. This
means there should not be a performance difference anymore, but this
needs to be verified.

Cc: Christian Koenig 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andrew Morton 
Cc: Jason Gunthorpe 
Cc: linux...@kvack.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Thomas Hellström (Intel) 
---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 --
  include/linux/mm.h  |  5 +
  mm/internal.h   |  5 -
  3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
b/drivers/gpu/drm/ttm/ttm_bo_vm.c

index 1c34983480e5..708c6fb9be81 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct 
vm_fault *vmf,

   * at arbitrary times while the data is mmap'ed.
   * See vmf_insert_mixed_prot() for a discussion.
   */
-    if (vma->vm_flags & VM_MIXEDMAP)
-    ret = vmf_insert_mixed_prot(vma, address,
-    __pfn_to_pfn_t(pfn, PFN_DEV),
-    prot);
-    else
-    ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+    ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
    /* Never error on prefaulted PTEs */
  if (unlikely((ret & VM_FAULT_ERROR))) {
@@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct 
ttm_buffer_object *bo, struct vm_area_s

   * Note: We're transferring the bo reference to
   * vma->vm_private_data here.
   */
-
  vma->vm_private_data = bo;
    /*
- * We'd like to use VM_PFNMAP on shared mappings, where
- * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
- * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
- * bad for performance. Until that has been sorted out, use
- * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
+ * PFNMAP forces us to block COW mappings in mmap(),
+ * and with MIXEDMAP we would incorrectly allow fast gup
+ * on TTM memory on architectures that don't have pte_special.
   */
-    vma->vm_flags |= VM_MIXEDMAP;
-    vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+    vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
  }
    int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
@@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct 
vm_area_struct *vma,

  if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
  return -EINVAL;
  +    if (unlikely(is_cow_mapping(vma->vm_flags)))
+    return -EINVAL;
+
  bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
  if (unlikely(!bo))
  return -EINVAL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..c6ebf7f9ddbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -686,6 +686,11 @

Re: [PATCH] drivers: gpu: Remove duplicate include of amdgpu_hdp.h

2021-03-22 Thread Christian König




Am 22.03.21 um 13:02 schrieb Wan Jiabing:

amdgpu_hdp.h has been included at line 91, so remove
the duplicate include.

Signed-off-by: Wan Jiabing 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 49267eb64302..68836c22ef25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -107,7 +107,6 @@
  #include "amdgpu_gfxhub.h"
  #include "amdgpu_df.h"
  #include "amdgpu_smuio.h"
-#include "amdgpu_hdp.h"
  
  #define MAX_GPU_INSTANCE		16
  




Re: [PATCH] amdgpu: avoid incorrect %hu format string

2021-03-22 Thread Christian König

Am 22.03.21 um 12:54 schrieb Arnd Bergmann:

From: Arnd Bergmann 

clang points out that the %hu format string does not match the type
of the variables here:

drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 
'unsigned short' but the argument has type 'unsigned int' [-Wformat]
   version_major, version_minor);
   ^
include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR'
 __drm_err(fmt, ##__VA_ARGS__)
   ~~~^~~

Change it to a regular %u, the same way a previous patch did for
another instance of the same warning.

Fixes: 0b437e64e0af ("drm/amdgpu: remove h from printk format specifier")
Signed-off-by: Arnd Bergmann 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index e2ed4689118a..c6dbc0801604 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -259,7 +259,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
if ((adev->asic_type == CHIP_POLARIS10 ||
 adev->asic_type == CHIP_POLARIS11) &&
(adev->uvd.fw_version < FW_1_66_16))
-   DRM_ERROR("POLARIS10/11 UVD firmware version %hu.%hu is too 
old.\n",
+   DRM_ERROR("POLARIS10/11 UVD firmware version %u.%u is too 
old.\n",
  version_major, version_minor);
} else {
unsigned int enc_major, enc_minor, dec_minor;




Re: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-22 Thread Christian König
: 950eadec5e80 R14: 950c03377858 R15: 
[17359.921050] FS:  7febb20cb740() GS:950ebfc0() 
knlGS:
[17359.930047] CS:  0010 DS:  ES:  CR0: 80050033
[17359.936674] CR2:  CR3: 0006d700e005 CR4: 001706e0

 From what I understand, the init_user_pages fails (returns EBUSY) and
the code goes to allocate_init_user_pages_failed where the unbind and
the userptr clear occurs.
Can we prevent this if we save the bounding status + userptr alloc? so
the function amdgpu_ttm_backend_unbind returns without trying to clear
the userptr memory?

Something like:

amdgpu_ttm_backend_bind:
 if (gtt->userptr) {
 r = amdgpu_ttm_tt_pin_userptr(bdev, ttm);
 if (r) ...
gtt->sg_table = true;
}

amdgpu_ttm_backend_unbind:
if (gtt->sg_table) {
 if (gtt->user_ptr) ...
}

If you agree, I'll send a v2 patch. Otherwise, maybe we could return
within amdgpu_ttm_tt_unpin_userptr if memory hasn't been allocated.

Any other ideas?

Regards,
Daniel


Reverting this patch fixes the problem for me.

Regards,
Felix

On 2021-03-18 10:57 p.m., Alex Deucher wrote:

Applied.  Thanks!

Alex

On Thu, Mar 18, 2021 at 5:00 AM Koenig, Christian
 wrote:

Reviewed-by: Christian König 

Von: Daniel Gomez 
Gesendet: Donnerstag, 18. März 2021 09:32
Cc: dag...@gmail.com ; Daniel Gomez ; Deucher, Alexander 
; Koenig, Christian ; David Airlie ; Daniel 
Vetter ; amd-...@lists.freedesktop.org ; dri-de...@lists.freedesktop.org 
; linux-kernel@vger.kernel.org 
Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

If userptr pages have been pinned but not bounded,
they remain uncleared.

Signed-off-by: Daniel Gomez 
---
   drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index e8c66d10478f..bbcc6264d48f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct 
ttm_bo_device *bdev, struct ttm_tt
   struct radeon_ttm_tt *gtt = (void *)ttm;
   struct radeon_device *rdev = radeon_get_rdev(bdev);

+   if (gtt->userptr)
+   radeon_ttm_tt_unpin_userptr(bdev, ttm);
+
   if (!gtt->bound)
   return;

   radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);

-   if (gtt->userptr)
-   radeon_ttm_tt_unpin_userptr(bdev, ttm);
   gtt->bound = false;
   }

--
2.30.2

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [RFC PATCH 2/2] mm,drm/ttm: Use VM_PFNMAP for TTM vmas

2021-03-22 Thread Christian König

Am 21.03.21 um 19:45 schrieb Thomas Hellström (Intel):

To block fast gup we need to make sure TTM ptes are always special.
With MIXEDMAP we, on architectures that don't support pte_special,
insert normal ptes, but OTOH on those architectures, fast is not
supported.
At the same time, the function documentation to vm_normal_page() suggests
that ptes pointing to system memory pages of MIXEDMAP vmas are always
normal, but that doesn't seem consistent with what's implemented in
vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
needed.

But to make sure and to avoid also normal (non-fast) gup, make all
TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
anymore so make is_cow_mapping() available and use it to reject
COW mappigs at mmap time.


I would separate the disallowing of COW mapping from the PFN change. I'm 
pretty sure that COW mappings never worked on TTM BOs in the first place.


But either way this patch is Reviewed-by: Christian König 
.


Thanks,
Christian.



There was previously a comment in the code that WC mappings together
with x86 PAT + PFNMAP was bad for performance. However from looking at
vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
are handled the same for architectures that support pte_special. This
means there should not be a performance difference anymore, but this
needs to be verified.

Cc: Christian Koenig 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andrew Morton 
Cc: Jason Gunthorpe 
Cc: linux...@kvack.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Thomas Hellström (Intel) 
---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 --
  include/linux/mm.h  |  5 +
  mm/internal.h   |  5 -
  3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 1c34983480e5..708c6fb9be81 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 * at arbitrary times while the data is mmap'ed.
 * See vmf_insert_mixed_prot() for a discussion.
 */
-   if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vmf_insert_mixed_prot(vma, address,
-   __pfn_to_pfn_t(pfn, 
PFN_DEV),
-   prot);
-   else
-   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
  
  		/* Never error on prefaulted PTEs */

if (unlikely((ret & VM_FAULT_ERROR))) {
@@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct 
ttm_buffer_object *bo, struct vm_area_s
 * Note: We're transferring the bo reference to
 * vma->vm_private_data here.
 */
-
vma->vm_private_data = bo;
  
  	/*

-* We'd like to use VM_PFNMAP on shared mappings, where
-* (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
-* but for some reason VM_PFNMAP + x86 PAT + write-combine is very
-* bad for performance. Until that has been sorted out, use
-* VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
+* PFNMAP forces us to block COW mappings in mmap(),
+* and with MIXEDMAP we would incorrectly allow fast gup
+* on TTM memory on architectures that don't have pte_special.
 */
-   vma->vm_flags |= VM_MIXEDMAP;
-   vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+   vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
  }
  
  int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,

@@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct 
*vma,
if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
return -EINVAL;
  
+	if (unlikely(is_cow_mapping(vma->vm_flags)))

+   return -EINVAL;
+
bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
if (unlikely(!bo))
return -EINVAL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..c6ebf7f9ddbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct 
*vma)
return vma->vm_flags & VM_ACCESS_FLAGS;
  }
  
+static inline bool is_cow_mapping(vm_flags_t flags)

+{
+   return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
+}
+
  #ifdef CONFIG_SHMEM
  /*
   * The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/internal.h b/mm/internal.h
index 9902648f2206..1432feec62df 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct page *page)

Re: [PATCH v2] drm/radeon: don't evict if not initialized

2021-03-22 Thread Christian König

Am 21.03.21 um 16:19 schrieb Tong Zhang:

TTM_PL_VRAM may not initialized at all when calling
radeon_bo_evict_vram(). We need to check before doing eviction.

[2.160837] BUG: kernel NULL pointer dereference, address: 0020
[2.161212] #PF: supervisor read access in kernel mode
[2.161490] #PF: error_code(0x) - not-present page
[2.161767] PGD 0 P4D 0
[2.163088] RIP: 0010:ttm_resource_manager_evict_all+0x70/0x1c0 [ttm]
[2.168506] Call Trace:
[2.168641]  radeon_bo_evict_vram+0x1c/0x20 [radeon]
[2.168936]  radeon_device_fini+0x28/0xf9 [radeon]
[2.169224]  radeon_driver_unload_kms+0x44/0xa0 [radeon]
[2.169534]  radeon_driver_load_kms+0x174/0x210 [radeon]
[2.169843]  drm_dev_register+0xd9/0x1c0 [drm]
[2.170104]  radeon_pci_probe+0x117/0x1a0 [radeon]

Suggested-by: Christian König 
Signed-off-by: Tong Zhang 


Reviewed-by: Christian König 


---
v2: coding style fix

  drivers/gpu/drm/radeon/radeon_object.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 9b81786782de..499ce55e34cc 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -384,6 +384,8 @@ int radeon_bo_evict_vram(struct radeon_device *rdev)
}
  #endif
man = ttm_manager_type(bdev, TTM_PL_VRAM);
+   if (!man)
+   return 0;
return ttm_resource_manager_evict_all(bdev, man);
  }
  




Re: [PATCH] drm/radeon: don't evict if not initialized

2021-03-21 Thread Christian König




Am 20.03.21 um 21:10 schrieb Tong Zhang:

TTM_PL_VRAM may not initialized at all when calling
radeon_bo_evict_vram(). We need to check before doing eviction.

[2.160837] BUG: kernel NULL pointer dereference, address: 0020
[2.161212] #PF: supervisor read access in kernel mode
[2.161490] #PF: error_code(0x) - not-present page
[2.161767] PGD 0 P4D 0
[2.163088] RIP: 0010:ttm_resource_manager_evict_all+0x70/0x1c0 [ttm]
[2.168506] Call Trace:
[2.168641]  radeon_bo_evict_vram+0x1c/0x20 [radeon]
[2.168936]  radeon_device_fini+0x28/0xf9 [radeon]
[2.169224]  radeon_driver_unload_kms+0x44/0xa0 [radeon]
[2.169534]  radeon_driver_load_kms+0x174/0x210 [radeon]
[2.169843]  drm_dev_register+0xd9/0x1c0 [drm]
[2.170104]  radeon_pci_probe+0x117/0x1a0 [radeon]

Signed-off-by: Tong Zhang 
---
  drivers/gpu/drm/radeon/radeon_object.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 9b81786782de..04e9a8118b0e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -384,7 +384,9 @@ int radeon_bo_evict_vram(struct radeon_device *rdev)
}
  #endif
man = ttm_manager_type(bdev, TTM_PL_VRAM);
-   return ttm_resource_manager_evict_all(bdev, man);
+   if (man)
+   return ttm_resource_manager_evict_all(bdev, man);
+   return 0;


You should probably code this the other way around, e.g.

If (!man)
    return 0;
...

Apart from that looks good to me.

Christian.


  }
  
  void radeon_bo_force_delete(struct radeon_device *rdev)




Re: [PATCH 06/19] drm/amd/display/dc/calcs/dce_calcs: Move some large variables from the stack to the heap

2021-03-19 Thread Christian König




Am 19.03.21 um 19:26 schrieb Harry Wentland:

On 2021-03-19 2:13 p.m., Alex Deucher wrote:

+ Harry, Nick

On Fri, Mar 19, 2021 at 4:24 AM Lee Jones  wrote:


Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c: In 
function ‘calculate_bandwidth’:
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:2016:1: 
warning: the frame size of 1216 bytes is larger than 1024 bytes 
[-Wframe-larger-than=]


Cc: Harry Wentland 
Cc: Leo Li 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Colin Ian King 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 
---
  .../gpu/drm/amd/display/dc/calcs/dce_calcs.c  | 32 
---

  1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c

index e633f8a51edb6..9d8f2505a61c2 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
@@ -98,16 +98,16 @@ static void calculate_bandwidth(
 int32_t num_cursor_lines;

 int32_t i, j, k;
-   struct bw_fixed yclk[3];
-   struct bw_fixed sclk[8];
+   struct bw_fixed *yclk;
+   struct bw_fixed *sclk;
 bool d0_underlay_enable;
 bool d1_underlay_enable;
 bool fbc_enabled;
 bool lpt_enabled;
 enum bw_defines sclk_message;
 enum bw_defines yclk_message;
-   enum bw_defines tiling_mode[maximum_number_of_surfaces];
-   enum bw_defines surface_type[maximum_number_of_surfaces];
+   enum bw_defines *tiling_mode;
+   enum bw_defines *surface_type;
 enum bw_defines voltage;
 enum bw_defines pipe_check;
 enum bw_defines hsr_check;
@@ -122,6 +122,22 @@ static void calculate_bandwidth(
 int32_t number_of_displays_enabled_with_margin = 0;
 int32_t number_of_aligned_displays_with_no_margin = 0;

+   yclk = kcalloc(3, sizeof(*yclk), GFP_KERNEL);
+   if (!yclk)
+   return;
+
+   sclk = kcalloc(8, sizeof(*sclk), GFP_KERNEL);
+   if (!sclk)
+   goto free_yclk;
+
+   tiling_mode = kcalloc(maximum_number_of_surfaces, 
sizeof(*tiling_mode), GFP_KERNEL);

+   if (!tiling_mode)
+   goto free_sclk;
+
+   surface_type = kcalloc(maximum_number_of_surfaces, 
sizeof(*surface_type), GFP_KERNEL);

+   if (!surface_type)
+   goto free_tiling_mode;
+



Harry or Nick can correct me if I'm wrong, but for this patch and the
next one, I think this can be called from an atomic context.



From what I can see this doesn't seem the case. If I'm missing 
something someone please correct me.


Have you taken into account that using FP functions require atomic 
context as well?


We had quite a bunch of problems with that and had to replace some 
GFP_KERNEL with GFP_ATOMIC in the DC code because of this.


Could of course be that this code here isn't affected by that, but 
better save than sorry.


Christian.



This and the next (06/19) patch are both
Reviewed-by: Harry Wentland 

Harry


Alex


 yclk[low] = vbios->low_yclk;
 yclk[mid] = vbios->mid_yclk;
 yclk[high] = vbios->high_yclk;
@@ -2013,6 +2029,14 @@ static void calculate_bandwidth(
 }
 }
 }
+
+   kfree(surface_type);
+free_tiling_mode:
+   kfree(tiling_mode);
+free_yclk:
+   kfree(yclk);
+free_sclk:
+   kfree(sclk);
  }

/***
--
2.27.0

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel>




Re: [PATCH 1/3] drm/ttm: move swapout logic around v2

2021-03-19 Thread Christian König
General question for the audience: Is there any way to silence the build 
bot here?


This is a well known false positive.

Regards,
Christian.

Am 18.03.21 um 19:13 schrieb kernel test robot:

Hi "Christian,

I love your patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[also build test ERROR on next-20210318]
[cannot apply to drm-intel/for-linux-next drm-exynos/exynos-drm-next 
linus/master v5.12-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Christian-K-nig/drm-ttm-move-swapout-logic-around-v2/20210318-204848
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: x86_64-randconfig-a005-20210318 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
6db3ab2903f42712f44000afb5aa467efbd25f35)
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 # install x86_64 cross compiling tool for clang build
 # apt-get install binutils-x86-64-linux-gnu
 # 
https://github.com/0day-ci/linux/commit/a454d56ea061b53d24a62a700743e4508dd6c9b1
 git remote add linux-review https://github.com/0day-ci/linux
 git fetch --no-tags linux-review 
Christian-K-nig/drm-ttm-move-swapout-logic-around-v2/20210318-204848
 git checkout a454d56ea061b53d24a62a700743e4508dd6c9b1
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):


drivers/gpu/drm/ttm/ttm_device.c:109:5: error: conflicting types for 
'ttm_global_swapout'

int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
^
include/drm/ttm/ttm_device.h:300:6: note: previous declaration is here
long ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
 ^
1 error generated.


vim +/ttm_global_swapout +109 drivers/gpu/drm/ttm/ttm_device.c

104 
105 /**
106  * A buffer object shrink method that tries to swap out the first
107  * buffer object on the global::swap_lru list.
108  */
  > 109  int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t 
gfp_flags)
110 {
111 struct ttm_global *glob = _glob;
112 struct ttm_buffer_object *bo;
113 unsigned i;
114 int ret;
115 
116 spin_lock(>lru_lock);
117 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
118 list_for_each_entry(bo, >swap_lru[i], swap) {
119 uint32_t num_pages = bo->ttm->num_pages;
120 
121 ret = ttm_bo_swapout(bo, ctx, gfp_flags);
122 /* ttm_bo_swapout has dropped the lru_lock */
123 if (!ret)
124 return num_pages;
125 if (ret != -EBUSY)
126 return ret;
127 }
128 }
129 spin_unlock(>lru_lock);
130 return 0;
131 }
132 EXPORT_SYMBOL(ttm_global_swapout);
133 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org




Re: [PATCH] dma-buf: use wake_up_process() instead of wake_up_state()

2021-03-19 Thread Christian König

Am 19.03.21 um 03:58 schrieb Wang Qing:

Using wake_up_process() is more simpler and friendly,
and it is more convenient for analysis and statistics

Signed-off-by: Wang Qing 


Reviewed-by: Christian König 

Should I pick it up or do you want to push it through some other tree 
than DRM?


Thanks,
Christian.


---
  drivers/dma-buf/dma-fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 7475e09..de51326
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -655,7 +655,7 @@ dma_fence_default_wait_cb(struct dma_fence *fence, struct 
dma_fence_cb *cb)
struct default_wait_cb *wait =
container_of(cb, struct default_wait_cb, base);
  
-	wake_up_state(wait->task, TASK_NORMAL);

+   wake_up_process(wait->task);
  }
  
  /**




Re: [PATCH] drm/amdgpu/ttm: Fix memory leak userptr pages

2021-03-18 Thread Christian König

Am 17.03.21 um 17:08 schrieb Daniel Gomez:

If userptr pages have been pinned but not bounded,
they remain uncleared.

Signed-off-by: Daniel Gomez 


Good catch, not sure if that can ever happen in practice but better save 
than sorry.


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9fd2157b133a..50c2b4827c13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1162,13 +1162,13 @@ static void amdgpu_ttm_backend_unbind(struct 
ttm_bo_device *bdev,
struct amdgpu_ttm_tt *gtt = (void *)ttm;
int r;
  
-	if (!gtt->bound)

-   return;
-
/* if the pages have userptr pinning then clear that first */
if (gtt->userptr)
amdgpu_ttm_tt_unpin_userptr(bdev, ttm);
  
+	if (!gtt->bound)

+   return;
+
if (gtt->offset == AMDGPU_BO_INVALID_OFFSET)
return;
  




Re: [bisected] Re: nouveau: lockdep cli->mutex vs reservation_ww_class_mutex deadlock report

2021-03-15 Thread Christian König

Hi Mike,

I'm pretty sure your bisection is a bit off.

The patch you mentioned is completely unrelated to Nouveau and I think 
the code path in question is not even used by this driver.


Regards,
Christian.

Am 14.03.21 um 05:48 schrieb Mike Galbraith:

This little bugger bisected to...

b73cd1e2ebfc "drm/ttm: stop destroying pinned ghost object"

...and (the second time around) was confirmed on the spot.  However,
while the fingered commit still reverts cleanly, doing so at HEAD does
not make lockdep return to happy camper state (leading to bisection
#2), ie the fingered commit is only the beginning of nouveau's 5.12
cycle lockdep woes.

homer:..kernel/linux-master # quilt applied|grep revert
patches/revert-drm-ttm-Remove-pinned-bos-from-LRU-in-ttm_bo_move_to_lru_tail-v2.patch
patches/revert-drm-ttm-cleanup-LRU-handling-further.patch
patches/revert-drm-ttm-use-pin_count-more-extensively.patch
patches/revert-drm-ttm-stop-destroying-pinned-ghost-object.patch

That still ain't enough to appease lockdep at HEAD.  I'm not going to
muck about with it beyond that, since this looks a whole lot like yet
another example of "fixing stuff exposes other busted stuff".

On Wed, 2021-03-10 at 10:58 +0100, Mike Galbraith wrote:

[   29.966927] ==
[   29.966929] WARNING: possible circular locking dependency detected
[   29.966932] 5.12.0.g05a59d7-master #2 Tainted: GW   E
[   29.966934] --
[   29.966937] X/2145 is trying to acquire lock:
[   29.966939] 888120714518 (>mutex){+.+.}-{3:3}, at: 
nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967002]
but task is already holding lock:
[   29.967004] 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967053]
which lock already depends on the new lock.

[   29.967056]
the existing dependency chain (in reverse order) is:
[   29.967058]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   29.967063]__ww_mutex_lock.constprop.16+0xbe/0x10d0
[   29.967069]nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967112]nouveau_channel_prep+0x106/0x2e0 [nouveau]
[   29.967151]nouveau_channel_new+0x4f/0x760 [nouveau]
[   29.967188]nouveau_abi16_ioctl_channel_alloc+0xdf/0x350 [nouveau]
[   29.967223]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967245]drm_ioctl+0x2db/0x380 [drm]
[   29.967259]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967303]__x64_sys_ioctl+0x76/0xb0
[   29.967307]do_syscall_64+0x33/0x40
[   29.967310]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967314]
-> #0 (>mutex){+.+.}-{3:3}:
[   29.967318]__lock_acquire+0x1494/0x1ac0
[   29.967322]lock_acquire+0x23e/0x3b0
[   29.967325]__mutex_lock+0x95/0x9d0
[   29.967330]nouveau_bo_move+0x11f/0x980 [nouveau]
[   29.967377]ttm_bo_handle_move_mem+0x79/0x130 [ttm]
[   29.967384]ttm_bo_validate+0x156/0x1b0 [ttm]
[   29.967390]nouveau_bo_validate+0x48/0x70 [nouveau]
[   29.967438]nouveau_bo_pin+0x1de/0x310 [nouveau]
[   29.967487]nv50_wndw_prepare_fb+0x53/0x4d0 [nouveau]
[   29.967531]drm_atomic_helper_prepare_planes+0x8a/0x110 
[drm_kms_helper]
[   29.967547]nv50_disp_atomic_commit+0xa9/0x1b0 [nouveau]
[   29.967593]drm_atomic_helper_update_plane+0x10a/0x150 
[drm_kms_helper]
[   29.967606]drm_mode_cursor_universal+0x10b/0x220 [drm]
[   29.967627]drm_mode_cursor_common+0x190/0x200 [drm]
[   29.967648]drm_mode_cursor_ioctl+0x3d/0x50 [drm]
[   29.967669]drm_ioctl_kernel+0x91/0xe0 [drm]
[   29.967684]drm_ioctl+0x2db/0x380 [drm]
[   29.967699]nouveau_drm_ioctl+0x56/0xb0 [nouveau]
[   29.967748]__x64_sys_ioctl+0x76/0xb0
[   29.967752]do_syscall_64+0x33/0x40
[   29.967756]entry_SYSCALL_64_after_hwframe+0x44/0xae
[   29.967760]
other info that might help us debug this:

[   29.967764]  Possible unsafe locking scenario:

[   29.967767]CPU0CPU1
[   29.967770]
[   29.967772]   lock(reservation_ww_class_mutex);
[   29.967776]lock(>mutex);
[   29.967779]lock(reservation_ww_class_mutex);
[   29.967783]   lock(>mutex);
[   29.967786]
 *** DEADLOCK ***

[   29.967790] 3 locks held by X/2145:
[   29.967792]  #0: 88810365bcf8 (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
drm_mode_cursor_common+0x87/0x200 [drm]
[   29.967817]  #1: 888108d9e098 (crtc_ww_class_mutex){+.+.}-{3:3}, at: 
drm_modeset_lock+0xc3/0xe0 [drm]
[   29.967841]  #2: 888123c201a0 (reservation_ww_class_mutex){+.+.}-{3:3}, 
at: nouveau_bo_pin+0x2b/0x310 [nouveau]
[   29.967896]
stack backtrace:
[   29.967899] CPU: 

Re: [PATCH v2 1/1] drm/amdkfd: fix build error with AMD_IOMMU_V2=m

2021-03-11 Thread Christian König

Am 10.03.21 um 23:13 schrieb Felix Kuehling:

On 2021-03-09 11:50 a.m., Felix Kuehling wrote:

Using 'imply AMD_IOMMU_V2' does not guarantee that the driver can link
against the exported functions. If the GPU driver is built-in but the
IOMMU driver is a loadable module, the kfd_iommu.c file is indeed
built but does not work:

x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function 
`kfd_iommu_bind_process_to_device':

kfd_iommu.c:(.text+0x516): undefined reference to `amd_iommu_bind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function 
`kfd_iommu_unbind_process':
kfd_iommu.c:(.text+0x691): undefined reference to 
`amd_iommu_unbind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function 
`kfd_iommu_suspend':
kfd_iommu.c:(.text+0x966): undefined reference to 
`amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x97f): undefined reference to 
`amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x9a4): undefined reference to 
`amd_iommu_free_device'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function 
`kfd_iommu_resume':
kfd_iommu.c:(.text+0xa9a): undefined reference to 
`amd_iommu_init_device'
x86_64-linux-ld: kfd_iommu.c:(.text+0xadc): undefined reference to 
`amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xaff): undefined reference to 
`amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xc72): undefined reference to 
`amd_iommu_bind_pasid'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe08): undefined reference to 
`amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe26): undefined reference to 
`amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe42): undefined reference to 
`amd_iommu_free_device'


Use IS_REACHABLE to only build IOMMU-V2 support if the amd_iommu symbols
are reachable by the amdkfd driver. Output a warning if they are not,
because that may not be what the user was expecting.

Fixes: 64d1c3a43a6f ("drm/amdkfd: Centralize IOMMUv2 code and make it 
conditional")

Reported-by: Arnd Bergmann 
Signed-off-by: Felix Kuehling 

Ping. Can I get an R-b for this patch.


Reviewed-by: Christian König 



Thanks,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 6 ++
  drivers/gpu/drm/amd/amdkfd/kfd_iommu.h | 9 +++--
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c

index 66bbca61e3ef..9318936aa805 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -20,6 +20,10 @@
   * OTHER DEALINGS IN THE SOFTWARE.
   */
  +#include 
+
+#if IS_REACHABLE(CONFIG_AMD_IOMMU_V2)
+
  #include 
  #include 
  #include 
@@ -355,3 +359,5 @@ int kfd_iommu_add_perf_counters(struct 
kfd_topology_device *kdev)

    return 0;
  }
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h

index dd23d9fdf6a8..afd420b01a0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
@@ -23,7 +23,9 @@
  #ifndef __KFD_IOMMU_H__
  #define __KFD_IOMMU_H__
  -#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || 
defined(CONFIG_AMD_IOMMU_V2)

+#include 
+
+#if IS_REACHABLE(CONFIG_AMD_IOMMU_V2)
    #define KFD_SUPPORT_IOMMU_V2
  @@ -46,6 +48,9 @@ static inline int kfd_iommu_check_device(struct 
kfd_dev *kfd)

  }
  static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
  {
+#if IS_MODULE(CONFIG_AMD_IOMMU_V2)
+    WARN_ONCE(1, "iommu_v2 module is not usable by built-in KFD");
+#endif
  return 0;
  }
  @@ -73,6 +78,6 @@ static inline int 
kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)

  return 0;
  }
  -#endif /* defined(CONFIG_AMD_IOMMU_V2) */
+#endif /* IS_REACHABLE(CONFIG_AMD_IOMMU_V2) */
    #endif /* __KFD_IOMMU_H__ */




Re: [PATCH 1/1] drm/amdkfd: fix build error with AMD_IOMMU_V2=m

2021-03-09 Thread Christian König

Am 09.03.21 um 18:59 schrieb Alex Deucher:

On Tue, Mar 9, 2021 at 12:55 PM Jean-Philippe Brucker
 wrote:

Hi Felix,

On Tue, Mar 09, 2021 at 11:30:19AM -0500, Felix Kuehling wrote:

I think the proper fix would be to not rely on custom hooks into a particular
IOMMU driver, but to instead ensure that the amdgpu driver can do everything
it needs through the regular linux/iommu.h interfaces. I realize this
is more work,
but I wonder if you've tried that, and why it didn't work out.

As far as I know this hasn't been tried. I see that intel-iommu has its
own SVM thing, which seems to be similar to what our IOMMUv2 does. I
guess we'd have to abstract that into a common API.

The common API was added in 26b25a2b98e4 and implemented by the Intel
driver in 064a57d7ddfc. To support it an IOMMU driver implements new IOMMU
ops:
 .dev_has_feat()
 .dev_feat_enabled()
 .dev_enable_feat()
 .dev_disable_feat()
 .sva_bind()
 .sva_unbind()
 .sva_get_pasid()

And a device driver calls iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA)
followed by iommu_sva_bind_device().

If I remember correctly the biggest obstacle for KFD is the PASID
allocation, done by the GPU driver instead of the IOMMU driver, but there
may be others.

IIRC, we tried to make the original IOMMUv2 functionality generic but
other vendors were not interested at the time, so it ended up being
AMD specific and since nothing else was using the pasid allocations we
put them in the GPU driver.  I guess if this is generic now, it could
be moved to a common API and taken out of the driver.


There has been quite some effort for this already for generic PASID 
interface etc.. But it looks like that effort is stalled by now.


Anyway at least I'm perfectly fine to have the IOMMUv2 || !IOMMUv2 
dependency on the core amdgpu driver for x86.


That should solve the build problem at hand quite nicely.

Regards,
Christian.



Alex




Re: [PATCH] drm/amdkfd: fix build error with missing AMD_IOMMU_V2

2021-03-08 Thread Christian König

Am 08.03.21 um 21:02 schrieb Felix Kuehling:

Am 2021-03-08 um 2:33 p.m. schrieb Arnd Bergmann:

On Mon, Mar 8, 2021 at 8:11 PM Felix Kuehling  wrote:

Am 2021-03-08 um 2:05 p.m. schrieb Arnd Bergmann:

On Mon, Mar 8, 2021 at 5:24 PM Felix Kuehling  wrote:

The driver build should work without IOMMUv2. In amdkfd/Makefile, we
have this condition:

ifneq ($(CONFIG_AMD_IOMMU_V2),)
AMDKFD_FILES += $(AMDKFD_PATH)/kfd_iommu.o
endif

In amdkfd/kfd_iommu.h we define inline stubs of the functions that are
causing your link-failures if IOMMU_V2 is not enabled:

#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
... function declarations ...
#else
... stubs ...
#endif

Right, that is the problem I tried to explain in my patch description.

Should we just drop the 'imply' then and add a proper dependency like this?

   depends on DRM_AMDGPU && (X86_64 || ARM64 || PPC64)
   depends on AMD_IOMMU_V2=y || DRM_AMDGPU=m

I can send a v2 after some testing if you prefer this version.

No. My point is, there should not be a hard dependency. The build should
work without CONFIG_AMD_IOMMU_V2. I don't understand why it's not
working for you. It looks like you're building kfd_iommu.o, which should
not be happening when AMD_IOMMU_V2 is not enabled. The condition in
amdkfd/Makefile should make sure that kfd_iommu.o doesn't get built with
your kernel config.

Again, as I explained in the changelog text, AMD_IOMMU_V2 configured as
a loadable module, while AMDGPU is configured as built-in.

I'm sorry, I didn't read it carefully. And I thought "imply" was meant
to fix exactly this kind of issue.

I don't want to create a hard dependency on AMD_IOMMU_V2 if I can avoid
it, because it is only really needed for a small number of AMD APUs, and
even there it is now optional for more recent ones.

Is there a better way to avoid build failures without creating a hard
dependency?


What you need is the same trick we used for AGP on radeon/nouveau:

depends on AMD_IOMMU_V2 || !AMD_IOMMU_V2

This way when AMD_IOMMU_V2 is build as a module DRM_AMDGPU will be build 
as a module as well. When it is disabled completely we don't care.


Regards,
Christian.


   The documentation in
Documentation/kbuild/kconfig-language.rst suggests using if
(IS_REACHABLE(CONFIG_AMD_IOMMU_V2)) to guard those problematic function
calls. I think more generally, we could guard all of kfd_iommu.c with

     #if IS_REACHABLE(CONFIG_AMD_IOMMU_V2)

And use the same condition to define the stubs in kfd_iommu.h.

Regards,
   Felix



The causes a link failure for the vmlinux file, because the linker cannot
resolve addresses of loadable modules at compile time -- they have
not been loaded yet.

   Arnd
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [RESEND PATCH v6 1/2] procfs: Allow reading fdinfo with PTRACE_MODE_READ

2021-03-08 Thread Christian König

Am 08.03.21 um 18:06 schrieb Kalesh Singh:

Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers. In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting. Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to
a DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc//fd/* and
/proc//fdinfo -- both are only readable by the process owner,
as follows:
   1. Do a readlink on each FD.
   2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
   3. stat the file to get the dmabuf inode number.
   4. Read/ proc//fdinfo/, to get the DMA buffer size.

Accessing other processes' fdinfo requires root privileges. This limits
the use of the interface to debugging environments and is not suitable
for production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Suggested-by: Jann Horn 
Signed-off-by: Kalesh Singh 


Both patches are Acked-by: Christian König 


---
Hi everyone,

The initial posting of this patch can be found at [1].
I didn't receive any feedback last time, so resending here.
Would really appreciate any constructive comments/suggestions.

Thanks,
Kalesh

[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20210208155315.1367371-1-kaleshsingh%40google.com%2Fdata=04%7C01%7Cchristian.koenig%40amd.com%7C38c98420f0564e15117f08d8e2549ff5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637508200431130855%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=deJBlAk6%2BEQkfAC8iRK95xhV1%2FiO9Si%2Bylc5Z0QzzrM%3Dreserved=0

Changes in v2:
   - Update patch description
  fs/proc/base.c |  4 ++--
  fs/proc/fd.c   | 15 ++-
  2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3851bfcdba56..fd46d8dd0cf4 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3159,7 +3159,7 @@ static const struct pid_entry tgid_base_stuff[] = {
DIR("task",   S_IRUGO|S_IXUGO, proc_task_inode_operations, 
proc_task_operations),
DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
DIR("map_files",  S_IRUSR|S_IXUSR, proc_map_files_inode_operations, 
proc_map_files_operations),
-   DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns",   S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
  #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
@@ -3504,7 +3504,7 @@ static const struct inode_operations 
proc_tid_comm_inode_operations = {
   */
  static const struct pid_entry tid_base_stuff[] = {
DIR("fd",S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
-   DIR("fdinfo",S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
+   DIR("fdinfo",S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns",  S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
  #ifdef CONFIG_NET
DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 07fc4fad2602..6a80b40fd2fe 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -6,6 +6,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -72,6 +73,18 @@ static int seq_show(struct seq_file *m, void *v)
  
  static int seq_fdinfo_open(struct inode *inode, struct file *file)

  {
+   bool allowed = false;
+   struct task_struct *task = get_proc_task(inode);
+
+   if (!task)
+   return -ESRCH;
+
+   allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+   put_task_struct(task);
+
+   if (!allowed)
+   return -EACCES;
+
return single_open(file, seq_show, inode);
  }
  
@@ -308,7 +321,7 @@ static struct dentry *proc_fdinfo_instantiate(struct dentry *dentry,

struct proc_inode *ei;
struct inode *inode;
  
-	inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR);

+   inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO);
if (!inode)
return ERR_PTR(-ENOENT);
  




Re: [PATCH v1 12/15] powerpc/uaccess: Refactor get/put_user() and __get/put_user()

2021-03-08 Thread Christian König
The radeon warning is trivial to fix, going to send out a patch in a few 
moments.


Regards,
Christian.

Am 08.03.21 um 13:14 schrieb Christophe Leroy:

+Evgeniy for W1 Dallas
+Alex & Christian for RADEON

Le 07/03/2021 à 11:23, kernel test robot a écrit :

Hi Christophe,

I love your patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.12-rc2 next-20210305]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: 
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-Cleanup-of-uaccess-h/20210226-015715
base: 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next

config: powerpc-randconfig-s031-20210307 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
-O ~/bin/make.cross

 chmod +x ~/bin/make.cross
 # apt-get install sparse
 # sparse version: v0.6.3-245-gacc5c298-dirty
 # 
https://github.com/0day-ci/linux/commit/449bdbf978936e67e4919be8be0eec3e490a65e2

 git remote add linux-review https://github.com/0day-ci/linux
 git fetch --no-tags linux-review 
Christophe-Leroy/powerpc-Cleanup-of-uaccess-h/20210226-015715

 git checkout 449bdbf978936e67e4919be8be0eec3e490a65e2
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 
make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc


If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 



The mentioned patch is not the source of the problem, it only allows 
to spot it.


Christophe




"sparse warnings: (new ones prefixed by >>)"
drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: sparse: incorrect 
type in initializer (different address spaces) @@ expected char 
[noderef] __user *_pu_addr @@ got char *buf @@
    drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: expected char 
[noderef] __user *_pu_addr

    drivers/w1/slaves/w1_ds28e04.c:342:13: sparse: got char *buf
drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: sparse: incorrect 
type in initializer (different address spaces) @@ expected char 
const [noderef] __user *_gu_addr @@ got char const *buf @@
    drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: expected char 
const [noderef] __user *_gu_addr
    drivers/w1/slaves/w1_ds28e04.c:356:13: sparse: got char const 
*buf

--
    drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: sparse: cast 
removes address space '__user' of expression
    drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: sparse: cast 
removes address space '__user' of expression
drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: sparse: 
incorrect type in initializer (different address spaces) @@ 
expected unsigned int [noderef] __user *_pu_addr @@ got 
unsigned int [usertype] * @@
    drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: expected 
unsigned int [noderef] __user *_pu_addr
    drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: got 
unsigned int [usertype] *
    drivers/gpu/drm/radeon/radeon_ttm.c:933:21: sparse: sparse: cast 
removes address space '__user' of expression


vim +342 drivers/w1/slaves/w1_ds28e04.c

fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  338
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  339  static ssize_t 
crccheck_show(struct device *dev, struct device_attribute *attr,
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21 340   
char *buf)

fbf7f7b4e2ae40 Markus Franke  2012-05-26  341  {
fbf7f7b4e2ae40 Markus Franke  2012-05-26 @342  if 
(put_user(w1_enable_crccheck + 0x30, buf))

fbf7f7b4e2ae40 Markus Franke  2012-05-26  343 return -EFAULT;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  344
fbf7f7b4e2ae40 Markus Franke  2012-05-26  345  return 
sizeof(w1_enable_crccheck);

fbf7f7b4e2ae40 Markus Franke  2012-05-26  346  }
fbf7f7b4e2ae40 Markus Franke  2012-05-26  347
fa33a65a9cf7e2 Greg Kroah-Hartman 2013-08-21  348  static ssize_t 
crccheck_store(struct device *dev, struct device_attribute *attr,
fbf7f7b4e2ae40 Markus Franke  2012-05-26 349    
const char *buf, size_t count)

fbf7f7b4e2ae40 Markus Franke  2012-05-26  350  {
fbf7f7b4e2ae40 Markus Franke  2012-05-26  351  char val;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  352
fbf7f7b4e2ae40 Markus Franke  2012-05-26  353  if (count != 1 
|| !buf)

fbf7f7b4e2ae40 Markus Franke  2012-05-26  354 return -EINVAL;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  355
fbf7f7b4e2ae40 Markus Franke  2012-05-26 @356  if 
(get_user(val, buf))

fbf7f7b4e2ae40 Markus Franke  2012-05-26  357 return -EFAULT;
fbf7f7b4e2ae40 Markus Franke  2012-05-26  358
fbf7f7b4e2ae40 Markus Franke  

Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Christian König

Am 05.03.21 um 16:31 schrieb Sasha Levin:

On Fri, Mar 05, 2021 at 03:27:00PM +, Deucher, Alexander wrote:
Not sure if Sasha picked that up or not. Would need to check that.  
If it's not, this patch should be dropped.


Yes, it went in via autosel. I can drop it if it's not needed.



IIRC this patch was created *before* the feature which needs it was 
merged. So it isn't a bug fix, but rather just a prerequisite for a new 
feature.


Because of this it should only be merged into an older kernel if the new 
features is back ported as well.


Alex do you agree that we can drop it?

Thanks,
Christian.


Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Christian König

Am 05.03.21 um 15:48 schrieb Deucher, Alexander:

[AMD Public Use]


-Original Message-
From: Koenig, Christian 
Sent: Friday, March 5, 2021 8:03 AM
To: Greg Kroah-Hartman ; linux-
ker...@vger.kernel.org
Cc: sta...@vger.kernel.org; Das, Nirmoy ; Deucher,
Alexander ; Sasha Levin

Subject: Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio
compute queue

Mhm, I'm not sure this one needs to be backported.

Why did you pick it up Greg?

It was picked up by Sasha's fixes checker.


Well the change who needs this isn't in any earlier kernel, isn't it?

Christian.



Alex



Thanks,
Christian.

Am 05.03.21 um 13:21 schrieb Greg Kroah-Hartman:

From: Nirmoy Das 

[ Upstream commit 8c0225d79273968a65e73a4204fba023ae02714d ]

For high priority compute to work properly we need to enable wave
limiting on gfx pipe. Wave limiting is done through writing into
mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high priority
compute queue to avoid race condition between multiple high priority
compute queues writing that register simultaneously.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 15 ---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 +-
   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  |  6 ++
   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  6 ++
   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 ++-
   5 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index cd2c676a2797..8e0a6c62322e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -193,15 +193,16 @@ static bool

amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)

   }

   bool amdgpu_gfx_is_high_priority_compute_queue(struct

amdgpu_device *adev,

-  int pipe, int queue)
+  struct amdgpu_ring *ring)
   {
-   bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
-   int cond;
-   /* Policy: alternate between normal and high priority */
-   cond = multipipe_policy ? pipe : queue;
-
-   return ((cond % 2) != 0);
+   /* Policy: use 1st queue as high priority compute queue if we
+* have more than one compute queue.
+*/
+   if (adev->gfx.num_compute_rings > 1 &&
+   ring == >gfx.compute_ring[0])
+   return true;

+   return false;
   }

   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device

*adev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 6b5a8f4642cc..72dbcd2bc6a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -380,7 +380,7 @@ void

amdgpu_queue_mask_bit_to_mec_queue(struct amdgpu_device *adev, int
bit,

   bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev,

int mec,

 int pipe, int queue);
   bool amdgpu_gfx_is_high_priority_compute_queue(struct

amdgpu_device *adev,

-  int pipe, int queue);
+  struct amdgpu_ring *ring);
   int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, int me,
   int pipe, int queue);
   void amdgpu_gfx_bit_to_me_queue(struct amdgpu_device *adev, int

bit,

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e7d6da05011f..3a291befcddc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4495,8 +4495,7 @@ static int gfx_v10_0_compute_ring_init(struct

amdgpu_device *adev, int ring_id,

irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+ ring->pipe;
-   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev,

ring->pipe,

-   ring->queue) ?
+   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev,

ring) ?

AMDGPU_GFX_PIPE_PRIO_HIGH :

AMDGPU_GFX_PIPE_PRIO_NORMAL;

/* type-2 packets are deprecated on MEC, use type-3 instead */
r = amdgpu_ring_init(adev, ring, 1024, @@ -6545,8 +6544,7 @@ static
void gfx_v10_0_compute_mqd_set_priority(struct amdgpu_ring *ring,

struct

struct amdgpu_device *adev = ring->adev;

if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) {
-   if (amdgpu_gfx_is_high_priority_compute_queue(adev,

ring->pipe,

- ring->queue)) {
+   if (amdgpu_gfx_is_high_priority_compute_queue(adev,

ring)) {

  

Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Christian König

Mhm, I'm not sure this one needs to be backported.

Why did you pick it up Greg?

Thanks,
Christian.

Am 05.03.21 um 13:21 schrieb Greg Kroah-Hartman:

From: Nirmoy Das 

[ Upstream commit 8c0225d79273968a65e73a4204fba023ae02714d ]

For high priority compute to work properly we need to enable
wave limiting on gfx pipe. Wave limiting is done through writing
into mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high
priority compute queue to avoid race condition between multiple
high priority compute queues writing that register simultaneously.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 15 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  |  6 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  6 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 ++-
  5 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index cd2c676a2797..8e0a6c62322e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -193,15 +193,16 @@ static bool amdgpu_gfx_is_multipipe_capable(struct 
amdgpu_device *adev)
  }
  
  bool amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev,

-  int pipe, int queue)
+  struct amdgpu_ring *ring)
  {
-   bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
-   int cond;
-   /* Policy: alternate between normal and high priority */
-   cond = multipipe_policy ? pipe : queue;
-
-   return ((cond % 2) != 0);
+   /* Policy: use 1st queue as high priority compute queue if we
+* have more than one compute queue.
+*/
+   if (adev->gfx.num_compute_rings > 1 &&
+   ring == >gfx.compute_ring[0])
+   return true;
  
+	return false;

  }
  
  void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 6b5a8f4642cc..72dbcd2bc6a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -380,7 +380,7 @@ void amdgpu_queue_mask_bit_to_mec_queue(struct 
amdgpu_device *adev, int bit,
  bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev, int mec,
 int pipe, int queue);
  bool amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev,
-  int pipe, int queue);
+  struct amdgpu_ring *ring);
  int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, int me,
   int pipe, int queue);
  void amdgpu_gfx_bit_to_me_queue(struct amdgpu_device *adev, int bit,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e7d6da05011f..3a291befcddc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4495,8 +4495,7 @@ static int gfx_v10_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+ ring->pipe;
-   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev, ring->pipe,
-   ring->queue) ?
+   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev, ring) ?
AMDGPU_GFX_PIPE_PRIO_HIGH : AMDGPU_GFX_PIPE_PRIO_NORMAL;
/* type-2 packets are deprecated on MEC, use type-3 instead */
r = amdgpu_ring_init(adev, ring, 1024,
@@ -6545,8 +6544,7 @@ static void gfx_v10_0_compute_mqd_set_priority(struct 
amdgpu_ring *ring, struct
struct amdgpu_device *adev = ring->adev;
  
  	if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) {

-   if (amdgpu_gfx_is_high_priority_compute_queue(adev, ring->pipe,
- ring->queue)) {
+   if (amdgpu_gfx_is_high_priority_compute_queue(adev, ring)) {
mqd->cp_hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_HIGH;
mqd->cp_hqd_queue_priority =
AMDGPU_GFX_QUEUE_PRIORITY_MAXIMUM;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 37639214cbbb..b0284c4659ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1923,8 +1923,7 @@ static int gfx_v8_0_compute_ring_init(struct 
amdgpu_device *adev, int ring_id,
+ ((ring->me - 1) 

Re: [PATCH v8 3/5] dma-buf: heaps: Add deferred-free-helper library code

2021-03-05 Thread Christian König

Am 05.03.21 um 00:20 schrieb John Stultz:

This patch provides infrastructure for deferring buffer frees.

This is a feature ION provided which when used with some form
of a page pool, provides a nice performance boost in an
allocation microbenchmark. The reason it helps is it allows the
page-zeroing to be done out of the normal allocation/free path,
and pushed off to a kthread.


In general that's a nice idea, but to be honest this implementation 
looks broken and rather inefficient.


You should probably rather integrate that into the DRM pool core 
functionality which is currently clearing all freed pages anyway.


I would also use a work item per pool instead of a kthread, that would 
help with data locality.


Regards,
Christian.



As not all heaps will find this useful, its implemented as
a optional helper library that heaps can utilize.

Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Chris Goldsworthy 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v2:
* Fix sleep in atomic issue from using a mutex, by switching
   to a spinlock as Reported-by: kernel test robot 
* Cleanup API to use a reason enum for clarity and add some documentation
   comments as suggested by Suren Baghdasaryan.
v3:
* Minor tweaks so it can be built as a module
* A few small fixups suggested by Daniel Mentz
v4:
* Tweak from Daniel Mentz to make sure the shrinker
   count/freed values are tracked in pages not bytes
v5:
* Fix up page count tracking as suggested by Suren Baghdasaryan
v7:
* Rework accounting to use nr_pages rather then size, as suggested
   by Suren Baghdasaryan
---
  drivers/dma-buf/heaps/Kconfig|   3 +
  drivers/dma-buf/heaps/Makefile   |   1 +
  drivers/dma-buf/heaps/deferred-free-helper.c | 138 +++
  drivers/dma-buf/heaps/deferred-free-helper.h |  55 
  4 files changed, 197 insertions(+)
  create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.c
  create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.h

diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
index a5eef06c4226..f7aef8bc7119 100644
--- a/drivers/dma-buf/heaps/Kconfig
+++ b/drivers/dma-buf/heaps/Kconfig
@@ -1,3 +1,6 @@
+config DMABUF_HEAPS_DEFERRED_FREE
+   tristate
+
  config DMABUF_HEAPS_SYSTEM
bool "DMA-BUF System Heap"
depends on DMABUF_HEAPS
diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 974467791032..4e7839875615 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,3 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_DMABUF_HEAPS_DEFERRED_FREE) += deferred-free-helper.o
  obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o
  obj-$(CONFIG_DMABUF_HEAPS_CMA)+= cma_heap.o
diff --git a/drivers/dma-buf/heaps/deferred-free-helper.c 
b/drivers/dma-buf/heaps/deferred-free-helper.c
new file mode 100644
index ..e19c8b68dfeb
--- /dev/null
+++ b/drivers/dma-buf/heaps/deferred-free-helper.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Deferred dmabuf freeing helper
+ *
+ * Copyright (C) 2020 Linaro, Ltd.
+ *
+ * Based on the ION page pool code
+ * Copyright (C) 2011 Google, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "deferred-free-helper.h"
+
+static LIST_HEAD(free_list);
+static size_t list_nr_pages;
+wait_queue_head_t freelist_waitqueue;
+struct task_struct *freelist_task;
+static DEFINE_SPINLOCK(free_list_lock);
+
+void deferred_free(struct deferred_freelist_item *item,
+  void (*free)(struct deferred_freelist_item*,
+   enum df_reason),
+  size_t nr_pages)
+{
+   unsigned long flags;
+
+   INIT_LIST_HEAD(>list);
+   item->nr_pages = nr_pages;
+   item->free = free;
+
+   spin_lock_irqsave(_list_lock, flags);
+   list_add(>list, _list);
+   list_nr_pages += nr_pages;
+   spin_unlock_irqrestore(_list_lock, flags);
+   wake_up(_waitqueue);
+}
+EXPORT_SYMBOL_GPL(deferred_free);
+
+static size_t free_one_item(enum df_reason reason)
+{
+   unsigned long flags;
+   size_t nr_pages;
+   struct deferred_freelist_item *item;
+
+   spin_lock_irqsave(_list_lock, flags);
+   if (list_empty(_list)) {
+   spin_unlock_irqrestore(_list_lock, flags);
+   return 0;
+   }
+   item = list_first_entry(_list, struct deferred_freelist_item, 
list);
+   list_del(>list);
+   nr_pages = item->nr_pages;
+   list_nr_pages -= nr_pages;
+   spin_unlock_irqrestore(_list_lock, flags);
+
+   item->free(item, reason);
+   return nr_pages;
+}
+
+static unsigned long 

Re: [PATCH v8 2/5] drm: ttm_pool: Rework ttm_pool to use drm_page_pool

2021-03-05 Thread Christian König

Am 05.03.21 um 00:20 schrieb John Stultz:

This patch reworks the ttm_pool logic to utilize the recently
added drm_page_pool code.

This adds drm_page_pool structures to the ttm_pool_type
structures, and then removes all the ttm_pool_type shrinker
logic (as its handled in the drm_page_pool shrinker).

NOTE: There is one mismatch in the interfaces I'm not totally
happy with. The ttm_pool tracks all of its pooled pages across
a number of different pools, and tries to keep this size under
the specified page_pool_size value. With the drm_page_pool,
there may other users, however there is still one global
shrinker list of pools. So we can't easily reduce the ttm
pool under the ttm specified size without potentially doing
a lot of shrinking to other non-ttm pools. So either we can:
   1) Try to split it so each user of drm_page_pools manages its
  own pool shrinking.
   2) Push the max value into the drm_page_pool, and have it
  manage shrinking to fit under that global max. Then share
  those size/max values out so the ttm_pool debug output
  can have more context.

I've taken the second path in this patch set, but wanted to call
it out so folks could look closely.


That's perfectly fine with me. A global approach for the different page 
pool types is desired anyway as far as I can see.




Thoughts would be greatly appreciated here!

Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Chris Goldsworthy 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v7:
* Major refactoring to use drm_page_pools inside the
   ttm_pool_type structure. This allows us to use container_of to
   get the needed context to free a page. This also means less
   code is changed overall.
v8:
* Reworked to use the new cleanly rewritten drm_page_pool logic
---
  drivers/gpu/drm/Kconfig|   1 +
  drivers/gpu/drm/ttm/ttm_pool.c | 156 ++---
  include/drm/ttm/ttm_pool.h |   6 +-
  3 files changed, 31 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 7cbcecb8f7df..a6cbdb63f6c7 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -184,6 +184,7 @@ config DRM_PAGE_POOL
  config DRM_TTM
tristate
depends on DRM && MMU
+   select DRM_PAGE_POOL
help
  GPU memory management subsystem for devices with multiple
  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 6e27cb1bf48b..f74ea801d7ab 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -39,6 +39,7 @@
  #include 
  #endif
  
+#include 

  #include 
  #include 
  #include 
@@ -68,8 +69,6 @@ static struct ttm_pool_type 
global_dma32_write_combined[MAX_ORDER];
  static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
  
  static struct mutex shrinker_lock;

-static struct list_head shrinker_list;
-static struct shrinker mm_shrinker;
  
  /* Allocate pages of size 1 << order with the given gfp_flags */

  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t 
gfp_flags,
@@ -125,8 +124,9 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
  }
  
  /* Reset the caching and pages of size 1 << order */

-static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
-  unsigned int order, struct page *p)
+static unsigned long ttm_pool_free_page(struct ttm_pool *pool,
+   enum ttm_caching caching,
+   unsigned int order, struct page *p)
  {
unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
struct ttm_pool_dma *dma;
@@ -142,7 +142,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
  
  	if (!pool || !pool->use_dma_alloc) {

__free_pages(p, order);
-   return;
+   return 1UL << order;
}
  
  	if (order)

@@ -153,6 +153,16 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
dma_free_attrs(pool->dev, (1UL << order) * PAGE_SIZE, vaddr, dma->addr,
   attr);
kfree(dma);
+   return 1UL << order;


The returned value is always the same. So you wrapper can do this and we 
don't really need to change the function here.



+}
+
+static unsigned long ttm_subpool_free_page(struct drm_page_pool *subpool,
+  struct page *p)


Better call this ttm_pool_free_callback.


+{
+   struct ttm_pool_type *pt;
+
+   pt = container_of(subpool, struct ttm_pool_type, subpool);
+ 

Re: [PATCH v8 1/5] drm: Add a sharable drm page-pool implementation

2021-03-05 Thread Christian König

Am 05.03.21 um 00:20 schrieb John Stultz:

This adds a shrinker controlled page pool, extracted
out of the ttm_pool logic, and abstracted out a bit
so it can be used by other non-ttm drivers.


In general please keep the kernel doc which is in TTMs pool.



Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Chris Goldsworthy 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: John Stultz 
---
v8:
* Completely rewritten from scratch, using only the
   ttm_pool logic so it can be dual licensed.
---
  drivers/gpu/drm/Kconfig |   4 +
  drivers/gpu/drm/Makefile|   2 +
  drivers/gpu/drm/page_pool.c | 214 
  include/drm/page_pool.h |  65 +++
  4 files changed, 285 insertions(+)
  create mode 100644 drivers/gpu/drm/page_pool.c
  create mode 100644 include/drm/page_pool.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e392a90ca687..7cbcecb8f7df 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -177,6 +177,10 @@ config DRM_DP_CEC
  Note: not all adapters support this feature, and even for those
  that do support this they often do not hook up the CEC pin.
  
+config DRM_PAGE_POOL

+   bool
+   depends on DRM
+
  config DRM_TTM
tristate
depends on DRM && MMU
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 926adef289db..2dc7b2fe3fe5 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -39,6 +39,8 @@ obj-$(CONFIG_DRM_VRAM_HELPER) += drm_vram_helper.o
  drm_ttm_helper-y := drm_gem_ttm_helper.o
  obj-$(CONFIG_DRM_TTM_HELPER) += drm_ttm_helper.o
  
+drm-$(CONFIG_DRM_PAGE_POOL) += page_pool.o

+
  drm_kms_helper-y := drm_bridge_connector.o drm_crtc_helper.o drm_dp_helper.o \
drm_dsc.o drm_probe_helper.o \
drm_plane_helper.o drm_dp_mst_topology.o drm_atomic_helper.o \
diff --git a/drivers/gpu/drm/page_pool.c b/drivers/gpu/drm/page_pool.c
new file mode 100644
index ..a60b954cfe0f
--- /dev/null
+++ b/drivers/gpu/drm/page_pool.c
@@ -0,0 +1,214 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Sharable page pool implementation
+ *
+ * Extracted from drivers/gpu/drm/ttm/ttm_pool.c
+ * Copyright 2020 Advanced Micro Devices, Inc.
+ * Copyright 2021 Linaro Ltd.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Christian König, John Stultz
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static unsigned long page_pool_size;
+
+MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
+module_param(page_pool_size, ulong, 0644);
+
+static atomic_long_t allocated_pages;
+
+static struct mutex shrinker_lock;
+static struct list_head shrinker_list;
+static struct shrinker mm_shrinker;
+
+void drm_page_pool_set_max(unsigned long max)


This function and a whole bunch of other can be static.

And in general I'm not sure if we really need wrappers for functionality 
like this, e.g. it is only used once during startup IIRC.



+{
+   if (!page_pool_size)
+   page_pool_size = max;
+}
+
+unsigned long drm_page_pool_get_max(void)
+{
+   return page_pool_size;
+}
+
+unsigned long drm_page_pool_get_total(void)
+{
+   return atomic_long_read(_pages);
+}
+
+unsigned long drm_page_pool_get_size(struct drm_page_pool *pool)
+{
+   unsigned long size;
+
+   spin_lock(>lock);
+   size = pool->page_count;
+   spin_unlock(>lock);
+   return size;
+}
+
+/* Give pages into a specific pool */
+void drm_page_pool_add(struct drm_page_pool *pool, struct page *p)
+{
+   unsigned int i, num_pages = 1 << 

Re: [patch 1/7] drm/ttm: Replace kmap_atomic() usage

2021-03-04 Thread Christian König




Am 03.03.21 um 14:20 schrieb Thomas Gleixner:

From: Thomas Gleixner 

There is no reason to disable pagefaults and preemption as a side effect of
kmap_atomic_prot().

Use kmap_local_page_prot() instead and document the reasoning for the
mapping usage with the given pgprot.

Remove the NULL pointer check for the map. These functions return a valid
address for valid pages and the return was bogus anyway as it would have
left preemption and pagefaults disabled.

Signed-off-by: Thomas Gleixner 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
  drivers/gpu/drm/ttm/ttm_bo_util.c |   20 
  1 file changed, 12 insertions(+), 8 deletions(-)

--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -181,13 +181,15 @@ static int ttm_copy_io_ttm_page(struct t
return -ENOMEM;
  
  	src = (void *)((unsigned long)src + (page << PAGE_SHIFT));

-   dst = kmap_atomic_prot(d, prot);
-   if (!dst)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/


I find the comment a bit misleading. Maybe write:

/*
 * Locally map highmem pages with the correct pgprot.
 * Normal memory should already have the correct pgprot in the linear 
mapping.

 */

Apart from that looks good to me.

Regards,
Christian.


+   dst = kmap_local_page_prot(d, prot);
  
  	memcpy_fromio(dst, src, PAGE_SIZE);
  
-	kunmap_atomic(dst);

+   kunmap_local(dst);
  
  	return 0;

  }
@@ -203,13 +205,15 @@ static int ttm_copy_ttm_io_page(struct t
return -ENOMEM;
  
  	dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT));

-   src = kmap_atomic_prot(s, prot);
-   if (!src)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/
+   src = kmap_local_page_prot(s, prot);
  
  	memcpy_toio(dst, src, PAGE_SIZE);
  
-	kunmap_atomic(src);

+   kunmap_local(src);
  
  	return 0;

  }






Re: drm/ttm: ttm_bo_release called without lock

2021-03-03 Thread Christian König
I also already send a patch to the list to mitigate the warnings into a 
WARN_ON_ONCE().


Christian.

Am 04.03.21 um 08:42 schrieb Thomas Zimmermann:

(cc'ing Gerd)

This might be related to the recent clean-up patches for the BO 
handling in qxl.


Am 03.03.21 um 16:07 schrieb Petr Mladek:

On Wed 2021-03-03 15:34:09, Petr Mladek wrote:

Hi,

the following warning is filling my kernel log buffer
with 5.12-rc1+ kernels:

[  941.070598] WARNING: CPU: 0 PID: 11 at 
drivers/gpu/drm/ttm/ttm_bo.c:139 ttm_bo_move_to_lru_tail+0x1ba/0x210

[  941.070601] Modules linked in:
[  941.070603] CPU: 0 PID: 11 Comm: kworker/0:1 Kdump: loaded 
Tainted: G    W 5.12.0-rc1-default+ #81
[  941.070605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014

[  941.070606] Workqueue: events qxl_gc_work
[  941.070609] RIP: 0010:ttm_bo_move_to_lru_tail+0x1ba/0x210
[  941.070610] Code: 93 e8 02 00 00 48 89 0a e9 00 ff ff ff 48 8b 87 
38 01 00 00 be ff ff ff ff 48 8d 78 70 e8 8e 7d 46 00 85 c0 0f 85 6f 
fe ff ff <0f> 0b 8b 93 fc 02 00 00 85 d2 0f 84 6d fe ff ff 48 89 df 
5b 5d 41

[  941.070612] RSP: 0018:bddf4008fd38 EFLAGS: 00010246
[  941.070614] RAX:  RBX: 95ae485bac00 RCX: 
0002
[  941.070615] RDX:  RSI: 95ae485badb0 RDI: 
95ae40305108
[  941.070616] RBP:  R08: 0001 R09: 
0001
[  941.070617] R10: bddf4008fc10 R11: a5401580 R12: 
95ae42a94e90
[  941.070618] R13: 95ae485bae70 R14: 95ae485bac00 R15: 
95ae455d1800
[  941.070620] FS:  () GS:95aebf60() 
knlGS:

[  941.070621] CS:  0010 DS:  ES:  CR0: 80050033
[  941.070622] CR2: 7f8ffb2f8000 CR3: 000102c5e005 CR4: 
00370ef0
[  941.070624] DR0:  DR1:  DR2: 

[  941.070626] DR3:  DR6: fffe0ff0 DR7: 
0400

[  941.070627] Call Trace:
[  941.070630]  ttm_bo_release+0x551/0x600
[  941.070635]  qxl_bo_unref+0x3a/0x50
[  941.070638]  qxl_release_free_list+0x62/0xc0
[  941.070643]  qxl_release_free+0x76/0xe0
[  941.070646]  qxl_garbage_collect+0xd9/0x190
[  941.080241]  process_one_work+0x2b0/0x630
[  941.080249]  ? process_one_work+0x630/0x630
[  941.080251]  worker_thread+0x39/0x3f0
[  941.080255]  ? process_one_work+0x630/0x630
[  941.080257]  kthread+0x13a/0x150
[  941.080260]  ? kthread_create_worker_on_cpu+0x70/0x70
[  941.080265]  ret_from_fork+0x1f/0x30
[  941.080277] irq event stamp: 757191
[  941.080278] hardirqs last  enabled at (757197): 
[] vprintk_emit+0x27f/0x2c0
[  941.080280] hardirqs last disabled at (757202): 
[] vprintk_emit+0x23c/0x2c0
[  941.080281] softirqs last  enabled at (755768): 
[] __do_softirq+0x30f/0x432
[  941.080284] softirqs last disabled at (755763): 
[] irq_exit_rcu+0xea/0xf0


I have just realized that it actually prints two warnings over and
over again. The 2nd one is:

[  186.078790] WARNING: CPU: 0 PID: 146 at 
drivers/gpu/drm/ttm/ttm_bo.c:512 ttm_bo_release+0x533/0x600

[  186.078794] Modules linked in:
[  186.078795] CPU: 0 PID: 146 Comm: kworker/0:2 Kdump: loaded 
Tainted: G    W 5.12.0-rc1-default+ #81
[  186.078797] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014

[  186.078799] Workqueue: events qxl_gc_work
[  186.078801] RIP: 0010:ttm_bo_release+0x533/0x600
[  186.078803] Code: e9 c6 fb ff ff 4c 8b 7d d0 b9 4c 1d 00 00 31 d2 
be 01 00 00 00 49 8b bf d0 fe ff ff e8 86 f1 04 00 49 8b
47 e0 e9 2b ff ff ff <0f> 0b 48 8b 45 d0 31 d2 4c 89 f7 48 8d 70 08 
c7 80 94 00 00 00 00

[  186.078805] RSP: 0018:a22a402e3d60 EFLAGS: 00010202
[  186.078807] RAX: 0001 RBX: 9334cd8f5668 RCX: 
1180
[  186.078808] RDX: 93353f61a7c0 RSI: a6401580 RDI: 
9334c44f9588
[  186.078810] RBP: a22a402e3d90 R08: 0001 R09: 
0001
[  186.078811] R10: a22a402e3c10 R11: a6401580 R12: 
9334c48fa300
[  186.078812] R13: 9334c0f24e90 R14: 9334cd8f5400 R15: 
9334c4528000
[  186.078813] FS:  () GS:93353f60() 
knlGS:

[  186.078814] CS:  0010 DS:  ES:  CR0: 80050033
[  186.078816] CR2: 7f1908079860 CR3: 21824004 CR4: 
00370ef0
[  186.078818] DR0:  DR1:  DR2: 

[  186.078819] DR3:  DR6: fffe0ff0 DR7: 
0400

[  186.078821] Call Trace:
[  186.078826]  qxl_bo_unref+0x3a/0x50
[  186.078829]  qxl_release_free_list+0x62/0xc0
[  186.078834]  qxl_release_free+0x76/0xe0
[  186.078837]  qxl_garbage_collect+0xd9/0x190
[  186.078843]  process_one_work+0x2b0/0x630
[  186.078850]  ? process_one_work+0x630/0x630
[  186.078853]  worker_thread+0x39/0x3f0
[  186.078857]  ? process_one_work+0x630/0x630
[  

Re: drm/ttm: ttm_bo_release called without lock

2021-03-03 Thread Christian König

Hi Petr,

yes that is a known bug in qxl and yes the patch you mentioned makes it 
worse.


Going to reduce the warning into a WARN_ON_ONCE().

Regards,
Christian.

Am 03.03.21 um 15:34 schrieb Petr Mladek:

Hi,

the following warning is filling my kernel log buffer
with 5.12-rc1+ kernels:

[  941.070598] WARNING: CPU: 0 PID: 11 at drivers/gpu/drm/ttm/ttm_bo.c:139 
ttm_bo_move_to_lru_tail+0x1ba/0x210
[  941.070601] Modules linked in:
[  941.070603] CPU: 0 PID: 11 Comm: kworker/0:1 Kdump: loaded Tainted: G
W 5.12.0-rc1-default+ #81
[  941.070605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  941.070606] Workqueue: events qxl_gc_work
[  941.070609] RIP: 0010:ttm_bo_move_to_lru_tail+0x1ba/0x210
[  941.070610] Code: 93 e8 02 00 00 48 89 0a e9 00 ff ff ff 48 8b 87 38 01 00 00 be 
ff ff ff ff 48 8d 78 70 e8 8e 7d 46 00 85 c0 0f 85 6f fe ff ff <0f> 0b 8b 93 fc 
02 00 00 85 d2 0f 84 6d fe ff ff 48 89 df 5b 5d 41
[  941.070612] RSP: 0018:bddf4008fd38 EFLAGS: 00010246
[  941.070614] RAX:  RBX: 95ae485bac00 RCX: 0002
[  941.070615] RDX:  RSI: 95ae485badb0 RDI: 95ae40305108
[  941.070616] RBP:  R08: 0001 R09: 0001
[  941.070617] R10: bddf4008fc10 R11: a5401580 R12: 95ae42a94e90
[  941.070618] R13: 95ae485bae70 R14: 95ae485bac00 R15: 95ae455d1800
[  941.070620] FS:  () GS:95aebf60() 
knlGS:
[  941.070621] CS:  0010 DS:  ES:  CR0: 80050033
[  941.070622] CR2: 7f8ffb2f8000 CR3: 000102c5e005 CR4: 00370ef0
[  941.070624] DR0:  DR1:  DR2: 
[  941.070626] DR3:  DR6: fffe0ff0 DR7: 0400
[  941.070627] Call Trace:
[  941.070630]  ttm_bo_release+0x551/0x600
[  941.070635]  qxl_bo_unref+0x3a/0x50
[  941.070638]  qxl_release_free_list+0x62/0xc0
[  941.070643]  qxl_release_free+0x76/0xe0
[  941.070646]  qxl_garbage_collect+0xd9/0x190
[  941.080241]  process_one_work+0x2b0/0x630
[  941.080249]  ? process_one_work+0x630/0x630
[  941.080251]  worker_thread+0x39/0x3f0
[  941.080255]  ? process_one_work+0x630/0x630
[  941.080257]  kthread+0x13a/0x150
[  941.080260]  ? kthread_create_worker_on_cpu+0x70/0x70
[  941.080265]  ret_from_fork+0x1f/0x30
[  941.080277] irq event stamp: 757191
[  941.080278] hardirqs last  enabled at (757197): [] 
vprintk_emit+0x27f/0x2c0
[  941.080280] hardirqs last disabled at (757202): [] 
vprintk_emit+0x23c/0x2c0
[  941.080281] softirqs last  enabled at (755768): [] 
__do_softirq+0x30f/0x432
[  941.080284] softirqs last disabled at (755763): [] 
irq_exit_rcu+0xea/0xf0

My wild guess is that it might be related to the commit
3d1a88e1051f5d788d789 ("drm/ttm: cleanup LRU handling further").

Does it ring any bell, please?

Best Regards,
Petr




Re: [PATCH 08/53] drm/amd/display/dc/calcs/dce_calcs: Move some large variables from the stack to the heap

2021-03-03 Thread Christian König

Hi Lee,

I'm not an expert for the DC code base, but I think that this won't work.

This function is not allowed to sleep and the structures are a bit large 
to be allocated on the heap in an atomic context.


Regards,
Christian.

Am 03.03.21 um 14:42 schrieb Lee Jones:

Fixes the following W=1 kernel build warning(s):

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c: In function 
‘calculate_bandwidth’:
  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:2016:1: warning: 
the frame size of 1216 bytes is larger than 1024 bytes [-Wframe-larger-than=]

Cc: Harry Wentland 
Cc: Leo Li 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Colin Ian King 
Cc: amd-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Lee Jones 
---
  .../gpu/drm/amd/display/dc/calcs/dce_calcs.c  | 29 ---
  1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
index e633f8a51edb6..4f0474a3bbcad 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
@@ -98,16 +98,16 @@ static void calculate_bandwidth(
int32_t num_cursor_lines;
  
  	int32_t i, j, k;

-   struct bw_fixed yclk[3];
-   struct bw_fixed sclk[8];
+   struct bw_fixed *yclk;
+   struct bw_fixed *sclk;
bool d0_underlay_enable;
bool d1_underlay_enable;
bool fbc_enabled;
bool lpt_enabled;
enum bw_defines sclk_message;
enum bw_defines yclk_message;
-   enum bw_defines tiling_mode[maximum_number_of_surfaces];
-   enum bw_defines surface_type[maximum_number_of_surfaces];
+   enum bw_defines *tiling_mode;
+   enum bw_defines *surface_type;
enum bw_defines voltage;
enum bw_defines pipe_check;
enum bw_defines hsr_check;
@@ -122,6 +122,22 @@ static void calculate_bandwidth(
int32_t number_of_displays_enabled_with_margin = 0;
int32_t number_of_aligned_displays_with_no_margin = 0;
  
+	yclk = kcalloc(3, sizeof(*yclk), GFP_KERNEL);

+   if (!yclk)
+   return;
+
+   sclk = kcalloc(8, sizeof(*sclk), GFP_KERNEL);
+   if (!sclk)
+   return;
+
+   tiling_mode = kcalloc(maximum_number_of_surfaces, sizeof(*tiling_mode), 
GFP_KERNEL);
+   if (!tiling_mode)
+   return;
+
+   surface_type = kcalloc(maximum_number_of_surfaces, 
sizeof(*surface_type), GFP_KERNEL);
+   if (!surface_type)
+   return;
+
yclk[low] = vbios->low_yclk;
yclk[mid] = vbios->mid_yclk;
yclk[high] = vbios->high_yclk;
@@ -2013,6 +2029,11 @@ static void calculate_bandwidth(
}
}
}
+
+   kfree(tiling_mode);
+   kfree(surface_type);
+   kfree(yclk);
+   kfree(sclk);
  }
  
  /***




Re: [PATCH] drm/radeon: fix copy of uninitialized variable back to userspace

2021-03-03 Thread Christian König

Am 03.03.21 um 01:27 schrieb Colin King:

From: Colin Ian King 

Currently the ioctl command RADEON_INFO_SI_BACKEND_ENABLED_MASK can
copy back uninitialised data in value_tmp that pointer *value points
to. This can occur when rdev->family is less than CHIP_BONAIRE and
less than CHIP_TAHITI.  Fix this by adding in a missing -EINVAL
so that no invalid value is copied back to userspace.

Addresses-Coverity: ("Uninitialized scalar variable)
Cc: sta...@vger.kernel.org # 3.13+
Fixes: 439a1cfffe2c ("drm/radeon: expose render backend mask to the userspace")
Signed-off-by: Colin Ian King 


Reviewed-by: Christian König 

Let's hope that this doesn't break UAPI.

Christian.


---
  drivers/gpu/drm/radeon/radeon_kms.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 2479d6ab7a36..58876bb4ef2a 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -518,6 +518,7 @@ int radeon_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
*value = rdev->config.si.backend_enable_mask;
} else {
DRM_DEBUG_KMS("BACKEND_ENABLED_MASK is si+ only!\n");
+   return -EINVAL;
}
break;
case RADEON_INFO_MAX_SCLK:




Re: [PATCH 0/3] drm/ttm: constify static vm_operations_structs

2021-02-25 Thread Christian König

Am 23.02.21 um 18:31 schrieb Alex Deucher:

On Wed, Feb 10, 2021 at 8:14 AM Daniel Vetter  wrote:

On Wed, Feb 10, 2021 at 08:45:56AM +0100, Christian König wrote:

Reviewed-by: Christian König  for the series.

Smash it into -misc?

@Christian Koenig did these ever land?  I don't see them in drm-misc.


I've just pushed them to drm-misc-next. Sorry for the delay, totally 
forgot about them.


Christian.



Alex


-Daniel


Am 10.02.21 um 00:48 schrieb Rikard Falkeborn:

Constify a few static vm_operations_struct that are never modified. Their
only usage is to assign their address to the vm_ops field in the
vm_area_struct, which is a pointer to const vm_operations_struct. Make them
const to allow the compiler to put them in read-only memory.

With this series applied, all static struct vm_operations_struct in the
kernel tree are const.

Rikard Falkeborn (3):
drm/amdgpu/ttm: constify static vm_operations_struct
drm/radeon/ttm: constify static vm_operations_struct
drm/nouveau/ttm: constify static vm_operations_struct

   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
   drivers/gpu/drm/nouveau/nouveau_ttm.c   | 2 +-
   drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
   3 files changed, 3 insertions(+), 3 deletions(-)


--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2Fdata=04%7C01%7Cchristian.koenig%40amd.com%7C9d730e56efe54d3215ee08d8d820d642%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637496982837619645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=b4UU8bzeX%2Ba1VfObe8mta7fwtjVv%2F1wo4%2FPVuGZFW8Q%3Dreserved=0
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=04%7C01%7Cchristian.koenig%40amd.com%7C9d730e56efe54d3215ee08d8d820d642%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637496982837629638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=RKJh6p%2BTxaD0lH6M%2B0s3nah3tBatRFqoTvy3Mh7Lz5M%3Dreserved=0




Re: [PATCH] radeon: ERROR: space prohibited before that ','

2021-02-25 Thread Christian König
Well coding style clean ups are usually welcome, but not necessarily one 
by one.


We can probably merge this if you clean up all checkpatch.pl warnings in 
the whole file.


Christian.

Am 26.02.21 um 07:05 schrieb wangjingyu:

drm_property_create_range(rdev->ddev, 0 , "coherent", 0, 1);

Signed-off-by: wangjingyu 
---
  drivers/gpu/drm/radeon/radeon_display.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index 3a6fedad002d..439d1b3e87d8 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -1396,7 +1396,7 @@ static int radeon_modeset_create_props(struct 
radeon_device *rdev)
  
  	if (rdev->is_atom_bios) {

rdev->mode_info.coherent_mode_property =
-   drm_property_create_range(rdev->ddev, 0 , "coherent", 
0, 1);
+   drm_property_create_range(rdev->ddev, 0, "coherent", 0, 
1);
if (!rdev->mode_info.coherent_mode_property)
return -ENOMEM;
}




Re: [PATCH] drm/nouveau/pci: rework AGP dependency

2021-02-25 Thread Christian König

Am 25.02.21 um 13:52 schrieb Arnd Bergmann:

From: Arnd Bergmann 

I noticed a warning from 'nm' when CONFIG_TRIM_UNUSED_KSYMS is set
and IS_REACHABLE(CONFIG_AGP) is false:

drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.o: no symbols

I later found this is completely harmless and we should find a way
to suppress the warning, but at that point I had already done a
cleanup patch to address this instance.

It turns out this code could be improved anyway, as the current version
behaves unexpectedly when AGP is a loadable module but nouveau is built-in
itself, in which case it silently omits agp support.

A better way to handle this is with a Kconfig dependency that requires
AGP either to be disabled, or forces nouveau to be a module for AGP=m.
With this change, the compile-time hack can be removed and lld no
longer warns.

Fixes: 340b0e7c500a ("drm/nouveau/pci: merge agp handling from nouveau drm")
Signed-off-by: Arnd Bergmann 
---
  drivers/gpu/drm/nouveau/Kbuild | 1 +
  drivers/gpu/drm/nouveau/Kconfig| 1 +
  drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild | 2 +-
  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c  | 2 --
  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h  | 9 +
  5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild
index 60586fb8275e..173b8d9d85e3 100644
--- a/drivers/gpu/drm/nouveau/Kbuild
+++ b/drivers/gpu/drm/nouveau/Kbuild
@@ -15,6 +15,7 @@ nouveau-y := $(nvif-y)
  #- code also used by various userspace tools/tests
  include $(src)/nvkm/Kbuild
  nouveau-y += $(nvkm-y)
+nouveau-m += $(nvkm-m)
  
  # DRM - general

  ifdef CONFIG_X86
diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index 278e048235a9..90276a557a70 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -2,6 +2,7 @@
  config DRM_NOUVEAU
tristate "Nouveau (NVIDIA) cards"
depends on DRM && PCI && MMU
+   depends on AGP || !AGP


My first thought was WTF? But then I realized that this totally makes sense.

We should probably have the same for radeon as well.

Apart from that the patch is Acked-by: Christian König 




select IOMMU_API
select FW_LOADER
select DRM_KMS_HELPER
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
index 174bdf995271..a400c680cf65 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
@@ -1,5 +1,5 @@
  # SPDX-License-Identifier: MIT
-nvkm-y += nvkm/subdev/pci/agp.o
+nvkm-$(CONFIG_AGP) += nvkm/subdev/pci/agp.o
  nvkm-y += nvkm/subdev/pci/base.o
  nvkm-y += nvkm/subdev/pci/pcie.o
  nvkm-y += nvkm/subdev/pci/nv04.o
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
index 385a90f91ed6..86c9e1d658af 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.c
@@ -20,7 +20,6 @@
   * OTHER DEALINGS IN THE SOFTWARE.
   */
  #include "agp.h"
-#ifdef __NVKM_PCI_AGP_H__
  #include 
  
  struct nvkm_device_agp_quirk {

@@ -172,4 +171,3 @@ nvkm_agp_ctor(struct nvkm_pci *pci)
  
  	pci->agp.mtrr = arch_phys_wc_add(pci->agp.base, pci->agp.size);

  }
-#endif
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
index ad4d3621d02b..041fe1fbf093 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
@@ -1,15 +1,14 @@
-/* SPDX-License-Identifier: MIT */
-#include "priv.h"
-#if defined(CONFIG_AGP) || (defined(CONFIG_AGP_MODULE) && defined(MODULE))
  #ifndef __NVKM_PCI_AGP_H__
  #define __NVKM_PCI_AGP_H__
  
+/* SPDX-License-Identifier: MIT */

+#include "priv.h"
+#if IS_ENABLED(CONFIG_AGP)
  void nvkm_agp_ctor(struct nvkm_pci *);
  void nvkm_agp_dtor(struct nvkm_pci *);
  void nvkm_agp_preinit(struct nvkm_pci *);
  int nvkm_agp_init(struct nvkm_pci *);
  void nvkm_agp_fini(struct nvkm_pci *);
-#endif
  #else
  static inline void nvkm_agp_ctor(struct nvkm_pci *pci) {}
  static inline void nvkm_agp_dtor(struct nvkm_pci *pci) {}
@@ -17,3 +16,5 @@ static inline void nvkm_agp_preinit(struct nvkm_pci *pci) {}
  static inline int nvkm_agp_init(struct nvkm_pci *pci) { return -ENOSYS; }
  static inline void nvkm_agp_fini(struct nvkm_pci *pci) {}
  #endif
+
+#endif




Re: [PATCH] drm/ttm/ttm_bo: make ttm_bo_glob_use_count static

2021-02-23 Thread Christian König

Am 23.02.21 um 09:54 schrieb Jiapeng Chong:

Fix the following sparse warning:

drivers/gpu/drm/ttm/ttm_bo.c:53:10: warning: symbol
'ttm_bo_glob_use_count' was not declared. Should it be static?


IIRC we already have a patch for this on the mailing list and the mutex 
can be static as well.


Christian.



Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index b65f4b1..107dd13 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -50,7 +50,7 @@
   * ttm_global_mutex - protecting the global BO state
   */
  DEFINE_MUTEX(ttm_global_mutex);
-unsigned ttm_bo_glob_use_count;
+static unsigned ttm_bo_glob_use_count;
  struct ttm_bo_global ttm_bo_glob;
  EXPORT_SYMBOL(ttm_bo_glob);
  




Re: [PATCH] mutex: nuke mutex_trylock_recursive

2021-02-16 Thread Christian König




Am 16.02.21 um 11:13 schrieb Peter Zijlstra:

On Tue, Feb 16, 2021 at 10:29:00AM +0100, Daniel Vetter wrote:

On Tue, Feb 16, 2021 at 09:21:46AM +0100, Christian König wrote:

The last user went away in the 5.11 cycle.

Signed-off-by: Christian König 

Nice.

Reviewed-by: Daniel Vetter 

I think would be good to still stuff this into 5.12 before someone
resurrects this zombie.

Already done:

   
https://lkml.kernel.org/r/161296556531.23325.10473355259841906876.tip-bot2@tip-bot2


One less bad concept to worry about.

But your patch is missing to remove the checkpatch.pl check for this :)

Cheers,
Christian.


[PATCH] mutex: nuke mutex_trylock_recursive

2021-02-16 Thread Christian König
The last user went away in the 5.11 cycle.

Signed-off-by: Christian König 
---
 include/linux/mutex.h  | 25 -
 kernel/locking/mutex.c | 10 --
 scripts/checkpatch.pl  |  6 --
 3 files changed, 41 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index dcd185cbfe79..0cd631a19727 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -199,29 +199,4 @@ extern void mutex_unlock(struct mutex *lock);
 
 extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
 
-/*
- * These values are chosen such that FAIL and SUCCESS match the
- * values of the regular mutex_trylock().
- */
-enum mutex_trylock_recursive_enum {
-   MUTEX_TRYLOCK_FAILED= 0,
-   MUTEX_TRYLOCK_SUCCESS   = 1,
-   MUTEX_TRYLOCK_RECURSIVE,
-};
-
-/**
- * mutex_trylock_recursive - trylock variant that allows recursive locking
- * @lock: mutex to be locked
- *
- * This function should not be used, _ever_. It is purely for hysterical GEM
- * raisins, and once those are gone this will be removed.
- *
- * Returns:
- *  - MUTEX_TRYLOCK_FAILED- trylock failed,
- *  - MUTEX_TRYLOCK_SUCCESS   - lock acquired,
- *  - MUTEX_TRYLOCK_RECURSIVE - we already owned the lock.
- */
-extern /* __deprecated */ __must_check enum mutex_trylock_recursive_enum
-mutex_trylock_recursive(struct mutex *lock);
-
 #endif /* __LINUX_MUTEX_H */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5352ce50a97e..adb935090768 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -86,16 +86,6 @@ bool mutex_is_locked(struct mutex *lock)
 }
 EXPORT_SYMBOL(mutex_is_locked);
 
-__must_check enum mutex_trylock_recursive_enum
-mutex_trylock_recursive(struct mutex *lock)
-{
-   if (unlikely(__mutex_owner(lock) == current))
-   return MUTEX_TRYLOCK_RECURSIVE;
-
-   return mutex_trylock(lock);
-}
-EXPORT_SYMBOL(mutex_trylock_recursive);
-
 static inline unsigned long __owner_flags(unsigned long owner)
 {
return owner & MUTEX_FLAGS;
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 92e888ed939f..15f7f4fa6b99 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -7069,12 +7069,6 @@ sub process {
}
}
 
-# check for mutex_trylock_recursive usage
-   if ($line =~ /mutex_trylock_recursive/) {
-   ERROR("LOCKING",
- "recursive locking is bad, do not use this 
ever.\n" . $herecurr);
-   }
-
 # check for lockdep_set_novalidate_class
if ($line =~ /^.\s*lockdep_set_novalidate_class\s*\(/ ||
$line =~ /__lockdep_no_validate__\s*\)/ ) {
-- 
2.25.1



Re: [Linaro-mm-sig] DMA-buf and uncached system memory

2021-02-15 Thread Christian König




Am 15.02.21 um 15:41 schrieb David Laight:

From: Christian König

Sent: 15 February 2021 12:05

...

Snooping the CPU caches introduces some extra latency, so what can
happen is that the response to the PCIe read comes to late for the
scanout. The result is an underflow and flickering whenever something is
in the cache which needs to be flushed first.

Aren't you going to get the same problem if any other endpoints are
doing memory reads?


The PCIe device in this case is part of the SoC, so we have a high 
priority channel to memory.


Because of this the hardware designer assumed they have a guaranteed 
memory latency.



Possibly even ones that don't require a cache snoop and flush.

What about just the cpu doing a real memory transfer?

Or a combination of the two above happening just before your request.

If you don't have a big enough fifo you'll lose.

I did 'fix' a similar(ish) issue with video DMA latency on an embedded
system based the on SA1100/SA1101 by significantly reducing the clock
to the VGA panel whenever the cpu was doing 'slow io'.
(Interleaving an uncached cpu DRAM write between the slow io cycles
also fixed it.)
But the video was the only DMA device and that was an embedded system.
Given the application note about video latency didn't mention what was
actually happening, I'm not sure how many people actually got it working!


Yeah, I'm also not sure if AMD doesn't solve this with deeper fifos or 
more prefetching in future designs.


But you gave me at least one example where somebody had similar problems.

Thanks for the feedback,
Christian.



David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
Linaro-mm-sig mailing list
linaro-mm-...@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-mm-sig




Re: [PATCH v1 1/3] string: Consolidate yesno() helpers under string.h hood

2021-02-15 Thread Christian König

Am 15.02.21 um 15:21 schrieb Andy Shevchenko:

We have already few similar implementation and a lot of code that can benefit
of the yesno() helper.  Consolidate yesno() helpers under string.h hood.

Signed-off-by: Andy Shevchenko 


Looks like a good idea to me, feel free to add an Acked-by: Christian 
König  to the series.


But looking at the use cases for this, wouldn't it make more sense to 
teach kprintf some new format modifier for this?


Christian.


---
  .../drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c|  6 +-
  drivers/gpu/drm/i915/i915_utils.h|  6 +-
  drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c   | 12 +---
  include/linux/string.h   |  5 +
  4 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
index 360952129b6d..7fde4f90e513 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
@@ -23,6 +23,7 @@
   *
   */
  
+#include 

  #include 
  
  #include 

@@ -49,11 +50,6 @@ struct dmub_debugfs_trace_entry {
uint32_t param1;
  };
  
-static inline const char *yesno(bool v)

-{
-   return v ? "yes" : "no";
-}
-
  /* parse_write_buffer_into_params - Helper function to parse debugfs write 
buffer into an array
   *
   * Function takes in attributes passed to debugfs write entry
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index abd4dcd9f79c..e6da5a951132 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -27,6 +27,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  #include 
@@ -408,11 +409,6 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
  #define MBps(x) KBps(1000 * (x))
  #define GBps(x) ((u64)1000 * MBps((x)))
  
-static inline const char *yesno(bool v)

-{
-   return v ? "yes" : "no";
-}
-
  static inline const char *onoff(bool v)
  {
return v ? "on" : "off";
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 7d49fd4edc9e..c857d73abbd7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -34,6 +34,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  #include 
@@ -2015,17 +2016,6 @@ static const struct file_operations rss_debugfs_fops = {
  /* RSS Configuration.
   */
  
-/* Small utility function to return the strings "yes" or "no" if the supplied

- * argument is non-zero.
- */
-static const char *yesno(int x)
-{
-   static const char *yes = "yes";
-   static const char *no = "no";
-
-   return x ? yes : no;
-}
-
  static int rss_config_show(struct seq_file *seq, void *v)
  {
struct adapter *adapter = seq->private;
diff --git a/include/linux/string.h b/include/linux/string.h
index 9521d8cab18e..fd946a5e18c8 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -308,4 +308,9 @@ static __always_inline size_t str_has_prefix(const char 
*str, const char *prefix
return strncmp(str, prefix, len) == 0 ? len : 0;
  }
  
+static inline const char *yesno(bool yes)

+{
+   return yes ? "yes" : "no";
+}
+
  #endif /* _LINUX_STRING_H_ */




Re: DMA-buf and uncached system memory

2021-02-15 Thread Christian König

Am 15.02.21 um 13:16 schrieb Lucas Stach:

[SNIP]

Userspace components can then of course tell the exporter what the
importer needs, but validation if that stuff is correct and doesn't
crash the system must happen in the kernel.

What exactly do you mean by "scanout requires non-coherent memory"?
Does the scanout requestor always set the no-snoop PCI flag, so you get
garbage if some writes to memory are still stuck in the caches, or is
it some other requirement?

Snooping the CPU caches introduces some extra latency, so what can
happen is that the response to the PCIe read comes to late for the
scanout. The result is an underflow and flickering whenever something is
in the cache which needs to be flushed first.

Okay, that confirms my theory on why this is needed. So things don't
totally explode if you don't do it, but to in order to guarantee access
latency you need to take the no-snoop path, which means your device
effectively gets dma-noncoherent.


Exactly. My big question at the moment is if this is something AMD 
specific or do we have the same issue on other devices as well?



On the other hand when the don't snoop the CPU caches we at least get
garbage/stale data on the screen. That wouldn't be that worse, but the
big problem is that we have also seen machine check exceptions when
don't snoop and the cache is dirty.

If you attach to the dma-buf with a struct device which is non-coherent
it's the exporters job to flush any dirty caches. Unfortunately the DRM
caching of the dma-buf attachments in the DRM framework will get a bit
in the way here, so a DRM specific flush might be be needed. :/ Maybe
moving the whole buffer to uncached sysmem location on first attach of
a non-coherent importer would be enough?


Could work in theory, but problem is that for this to do I have to tear 
down all CPU mappings and attachments of other devices.


Apart from the problem that we don't have the infrastructure for that we 
don't know at import time that a buffer might be used for scan out. I 
would need to re-import it during fb creation or something like this.


Our current concept for AMD GPUs is rather that we try to use uncached 
memory as much as possible. So for the specific use case just checking 
if the exporter is AMDGPU and has the flag set should be enough for not.



So this should better be coherent or you can crash the box. ARM seems to
be really susceptible for this, x86 is fortunately much more graceful
and I'm not sure about other architectures.

ARM really dislikes pagetable setups with different attributes pointing
to the same physical page, however you should be fine as long as all
cached aliases are properly flushed from the cache before access via a
different alias.


Yeah, can totally confirm that and had to learn it the hard way.

Regards,
Christian.



Regards,
Lucas





Re: DMA-buf and uncached system memory

2021-02-15 Thread Christian König




Am 15.02.21 um 13:00 schrieb Thomas Zimmermann:

Hi

Am 15.02.21 um 10:49 schrieb Thomas Zimmermann:

Hi

Am 15.02.21 um 09:58 schrieb Christian König:

Hi guys,

we are currently working an Freesync and direct scan out from system 
memory on AMD APUs in A+A laptops.


On problem we stumbled over is that our display hardware needs to 
scan out from uncached system memory and we currently don't have a 
way to communicate that through DMA-buf.


Re-reading this paragrah, it sounds more as if you want to let the 
exporter know where to move the buffer. Is this another case of the 
missing-pin-flag problem?


No, your original interpretation was correct. Maybe my writing is a bit 
unspecific.


The real underlying issue is that our display hardware has a problem 
with latency when accessing system memory.


So the question is if that also applies to for example Intel hardware or 
other devices as well or if it is just something AMD specific?


Regards,
Christian.



Best regards
Thomas



For our specific use case at hand we are going to implement 
something driver specific, but the question is should we have 
something more generic for this?


For vmap operations, we return the address as struct dma_buf_map, 
which contains additional information about the memory buffer. In 
vram helpers, we have the interface drm_gem_vram_offset() that 
returns the offset of the GPU device memory.


Would it be feasible to combine both concepts into a dma-buf 
interface that returns the device-memory offset plus the additional 
caching flag?


There'd be a structure and a getter function returning the structure.

struct dma_buf_offset {
 bool cached;
 u64 address;
};

// return offset in *off
int dma_buf_offset(struct dma_buf *buf, struct dma_buf_off *off);

Whatever settings are returned by dma_buf_offset() are valid while 
the dma_buf is pinned.


Best regards
Thomas



After all the system memory access pattern is a PCIe extension and 
as such something generic.


Regards,
Christian.
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel







Re: DMA-buf and uncached system memory

2021-02-15 Thread Christian König

Am 15.02.21 um 12:53 schrieb Lucas Stach:

Am Montag, dem 15.02.2021 um 10:34 +0100 schrieb Christian König:

Am 15.02.21 um 10:06 schrieb Simon Ser:

On Monday, February 15th, 2021 at 9:58 AM, Christian König 
 wrote:


we are currently working an Freesync and direct scan out from system
memory on AMD APUs in A+A laptops.

On problem we stumbled over is that our display hardware needs to scan
out from uncached system memory and we currently don't have a way to
communicate that through DMA-buf.

For our specific use case at hand we are going to implement something
driver specific, but the question is should we have something more
generic for this?

After all the system memory access pattern is a PCIe extension and as
such something generic.

Intel also needs uncached system memory if I'm not mistaken?

No idea, that's why I'm asking. Could be that this is also interesting
for I+A systems.


Where are the buffers allocated? If GBM, then it needs to allocate memory that
can be scanned out if the USE_SCANOUT flag is set or if a scanout-capable
modifier is picked.

If this is about communicating buffer constraints between different components
of the stack, there were a few proposals about it. The most recent one is [1].

Well the problem here is on a different level of the stack.

See resolution, pitch etc:.. can easily communicated in userspace
without involvement of the kernel. The worst thing which can happen is
that you draw garbage into your own application window.

But if you get the caching attributes in the page tables (both CPU as
well as IOMMU, device etc...) wrong then ARM for example has the
tendency to just spontaneously reboot

X86 is fortunately a bit more gracefully and you only end up with random
data corruption, but that is only marginally better.

So to sum it up that is not something which we can leave in the hands of
userspace.

I think that exporters in the DMA-buf framework should have the ability
to tell importers if the system memory snooping is necessary or not.

There is already a coarse-grained way to do so: the dma_coherent
property in struct device, which you can check at dmabuf attach time.

However it may not be enough for the requirements of a GPU where the
engines could differ in their dma coherency requirements. For that you
need to either have fake struct devices for the individual engines or
come up with a more fine-grained way to communicate those requirements.


Yeah, that won't work. We need this on a per buffer level.


Userspace components can then of course tell the exporter what the
importer needs, but validation if that stuff is correct and doesn't
crash the system must happen in the kernel.

What exactly do you mean by "scanout requires non-coherent memory"?
Does the scanout requestor always set the no-snoop PCI flag, so you get
garbage if some writes to memory are still stuck in the caches, or is
it some other requirement?


Snooping the CPU caches introduces some extra latency, so what can 
happen is that the response to the PCIe read comes to late for the 
scanout. The result is an underflow and flickering whenever something is 
in the cache which needs to be flushed first.


On the other hand when the don't snoop the CPU caches we at least get 
garbage/stale data on the screen. That wouldn't be that worse, but the 
big problem is that we have also seen machine check exceptions when 
don't snoop and the cache is dirty.


So this should better be coherent or you can crash the box. ARM seems to 
be really susceptible for this, x86 is fortunately much more graceful 
and I'm not sure about other architectures.


Regards,
Christian.



Regards,
Lucas





  1   2   3   4   5   6   7   8   9   10   >