Replying to @GordonBGood and @mratsim: > > There is no reason to have such a high overhead especially in high > > performance computing. OpenMP is way way way lower. > > I agree, and I'm still not 100% sure where all the time is going
Are you able to share the benchmark you were using with us (as well as information like back-end compiler and version, processor, etc)? In our experiments, Chapel has generally shown to be competitive with OpenMP, so it would be interesting for us to understand better what you were doing (prior to resorting to a homegrown thread pool) in order to make sure nothing's going horribly awry. I'd also be curious whether you were using CHPL_TASKS=qthreads or CHPL_TASKS=fifo. Thanks. > I expect [Chapel's data pallelism] is similar to CoArray Fortran Chapel's data parallelism is significantly different than Co-Array Fortran, where an array-of-arrays approach is taken for distributed arrays. In contrast, Chapel's data parallelism is based on global-view domains (index sets) and arrays, which are an evolution of concepts that were pioneered by ZPL in the 1990's. By default, most data parallelism in Chapel is implemented using #cores tasks where cores is the number of processor cores to which the index set or array are distributed. This ensures that the computational granularity is based on the hardware parallelism rather than the data collection (though programmers can override these defaults if they want something finer/coarser). > I can see that it could be of interest if one had access to such a machine or > group of machines, which most of us probably never will. For those not in the market for a Cray or commodity cluster, I suspect AWS, Azure, and Google Cloud would be happy to offer you such access for a reasonable fee. :)
