Hi Mikulas,

On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
> Modern processors can detect linear memory accesses and prefetch data
> automatically, so there's no need to use prefetch.

Not each and every CPU that's capable of running Linux has prefetch
functionality :)

Still read-on...

> Signed-off-by: Mikulas Patocka <mpato...@redhat.com>
> 
> ---
>  drivers/gpu/drm/udl/udl_transfer.c |    7 -------
>  1 file changed, 7 deletions(-)
> 
> Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
> ===================================================================
> --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c     2018-05-31 
> 14:48:12.000000000 +0200
> +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c  2018-05-31 
> 14:48:12.000000000 +0200
> @@ -13,7 +13,6 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/fb.h>
> -#include <linux/prefetch.h>
>  #include <asm/unaligned.h>
>  
>  #include <drm/drmP.h>
> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac
>       int start = width;
>       int end = width;
>  
> -     prefetch((void *) front);
> -     prefetch((void *) back);

AFAIK prefetcher fetches new data according to a known history... i.e. based on 
previously
used pattern we'll trying to get the next batch of data.

But the code above is in the very beginning of the data processing routine where
prefetcher doesn't yet have any history to know what and where to prefetch.

So I'd say this particular usage is good.
At least those prefetches shouldn't hurt because typically it
would be just 1 instruction if those exist or nothing if CPU/compiler doesn't
support it.

>       for (j = 0; j < width; j++) {
>               if (back[j] != front[j]) {
>                       start = j;
> @@ -140,8 +136,6 @@ static void udl_compress_hline16(
>               const u8 *cmd_pixel_start, *cmd_pixel_end = NULL;
>               uint16_t pixel_val16;
>  
> -             prefetchw((void *) cmd); /* pull in one cache line at least */
> -

Pretty much the same is here. Obviously on the first iteration prefetcher dosn't
know where to get "cmd". On the next iterations it might be better but given 
amount
of operation happens further in the cycle (and in inner cycles) I won't be 
completely
sure that each and every prefetcher will still keep track of "cmd".

>               *cmd++ = 0xaf;
>               *cmd++ = 0x6b;
>               *cmd++ = (uint8_t) ((dev_addr >> 16) & 0xFF);
> @@ -158,7 +152,6 @@ static void udl_compress_hline16(
>                                       (unsigned long)(pixel_end - pixel) >> 
> log_bpp,
>                                       (unsigned long)(cmd_buffer_end - 1 - 
> cmd) / 2) << log_bpp);
>  
> -             prefetch_range((void *) pixel, cmd_pixel_end - pixel);

Again I'm not sure what we gain removing that code in comparison of possible
performance degradation on simpler CPUs.

And essentially all the same is applicable to UDLFB patch.

-Alexey
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to