yaxunl added a comment.

In D73979#1857664 <https://reviews.llvm.org/D73979#1857664>, @tra wrote:

> In D73979#1857536 <https://reviews.llvm.org/D73979#1857536>, @yaxunl wrote:
>
> > Based on CUDA usage of extern shared var 
> > (https://devblogs.nvidia.com/using-shared-memory-cuda-cc/), it seems CUDA 
> > also assumes all extern shared vars have the same address, therefore HIP 
> > and CUDA have similar behavior.
>
>
> Yes. Because of the `CUDA also assumes all extern shared vars have the same 
> address` I think that not allowing additional types for `extern __shared__` 
> makes sense for HIP, too. I'd rather not give users more ways to do a wrong 
> thing.
>
> Can you elaborate on why you want to allow this feature? While it would be 
> convenient for someone who uses exactly *one* such extern object, in practice 
> the most common use case is that users declare a single `extern __shared__` 
> array to serve as the memory pool and then manually allocate chunks within it 
> and assign the addresses to appropriately typed pointers. I guess they could 
> define an `extern __shared__ struct` with fields representing the objects, 
> but that seems sort of pointless considering that the only reason to use 
> `extern __shared__` is to allocate shared memory dynamically.
>
> In general, the concept of `extern __shared__` with *all* such extern items 
> occupying the same space is broken by design. It's not composable (every 
> function using one needs to coordinate with every other function doig the 
> same). It introduces failure modes not obvious from the source code (access 
> an object, fail with invalid memory access). It does not fit the conventional 
> meaning of what `extern something` means in C++ (different objects have 
> different addresses). IMO, it should not have existed and the shared 
> memory/pointer should've been exposed via explicit API. I.e. CUDA could've 
> used the same mechanism which provides threads with threadIdx and blockIdx.
>
> As things stand right now,  `extern __shared__` is something I want gone, not 
> added more features to. AFAICT, the limitations clang places on it right now 
> have not been an issue for the CUDA code we compile.
>
> Is there a pressing need for this feature for HIP? Perhaps it would make more 
> sense to introduce a more sensible API and port existing HIP code to use it.
>
> WDYT?


All extern shared vars are sharing the same address, however, they may be used 
as different types in different functions.

For example,

  __device__ int foo() {
    extern __shared__ int a;
    for (...) a+=...;
    return a;
  }
  
  __device__ double bar(int x) {
    extern __shared__ double b[10];
    for(...) b[x]+=...;
    return b[0];
  }
  
  __global__ void k() {
    foo();
    //...
    bar();
  }

In one function foo, users need to use the shared memory as an int. In another 
function, users need to use the shared memory as a double array. Users just 
need to make sure they request sufficient dynamic shared memory in triple 
chevron to be greater than the max dynamic shared memory usage. Users do not 
need to pass values in extern shared var between functions. They just treat it 
as an uninitialized variable. Forbidding different types for extern shared 
variable does not add any benefit, just forcing users to work around the 
limitation and resulting in less readable code.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73979/new/

https://reviews.llvm.org/D73979



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to