I've reducing this email to the parts to which I am responding...

> Offhand, I'm not aware of a recent Chapel-vs-TBB comparison (and wouldn't
> trust anything older because our performance has been improving by leaps
> and bounds over the past few years).
>

The code associated with http://arxiv.org/pdf/1302.2837.pdf is online here:
https://bitbucket.org/nanzs/multicore-languages/src, so anyone can run it
without writing any code.


> A second case that we're focused on currently is the LCALS loop kernel
> suite from Livermore.  I'm not seeing a TBB version of this offhand
> either.  In practice, our comparisons have been against OpenMP (which is
> the dominant shared memory parallel programming model in our community).
>

Porting LCALS to TBB should not be too hard since they use a forall C++
lambda design already, but I've spoken to the LLNL folks enough to know
that they favor OpenMP - it's an open standard and has multiple
implementations.  TBB is open-source and portable (e.g. Blue Gene/Q is
supported) but since it comes from Intel, it's unlikely to attract support
from certain other vendors that LLNL cares about.


> The Intel PRK suite (https://github.com/ParRes/Kernels) is a third set of
> benchmarks (for shared and/or distributed memory) that we've recently
> started looking at, though I'm not seeing a TBB entry there.  (And frankly,
> neither of these last two cases are set up to easily support automated
> cross-language comparisons as well as the first).
>

I don't know what automation we could support that would make
cross-language comparisons possible, but we (the PRK team) do not want  to
be limited to programming models where C/C++ is the base language.  For
example, I have already assigned myself the task of porting to Fortran 2008
(because of coarrays): https://github.com/ParRes/Kernels/issues/30.  Right
now, we use C by default, because it is the most portable option and a
least-common denominator for nearly all programming models.  The Grappa,
Charm++, and HPX-3 (unreleased) ports all use C++11.

I created https://github.com/ParRes/Kernels/issues/37 but decided against
assigning it to Brad without his consent.

Right now, the PRK team is in the middle of code modernization
sprint-marathon.  I recently setup Travis CI, which was a massive effort
due to the number of dependencies.  This was determined to be a
prerequisite for other modernization efforts, such as:
* redoing the build system.
* improving C99 (we won't move to C11 until it is more widely supported),
C++11, OpenMP 4.5, MPI-3, OpenSHMEM 1.3 compliance.
* better automated testing that runs not only in Travis but on big iron
(via https://github.com/travis-ci/travis-build).
* supporting even more programming models.  HPX, Legion and OCR are
in-progress.  Chapel, UPC++ and Fortran 2008 coarrays have been discussed.

Like Chapel, the PRK project is primarily interested in distributed memory,
particularly {petascale..exascale}.  However, since the PRKs are clearly
the best parallel programming model benchmarks around :-D, we are happy to
see improved support for shared memory models.

At least for PRK transpose, the distributed and shared memory components
are nearly orthogonal, so we should be able to support TBB (and
C11/C++11/POSIX threads, Cilk, ...) alongside OpenMP trivially once we
factor out the local kernel (currently it is inlined for simplicity).  For
the other PRKs, such components are not easily separable, although I won't
rule out the possibility that we will find a way to make the threading
integration more modular.

Note also that we have a SpMV benchmark in the PRK suite but it has only
been ported to MPI-1 and OpenMP.  There is no plan to change that.  It's
not clear to me that a SpMV kernel is sufficient to understand the
applications that do this, as SpMV is often specialized by domain.


> Back on your first question, I'll note that I'm not aware offhand of a
> good suite of shared memory sparse array/matrix benchmarks that would be
> appropriate for Chapel vs. TBB or OpenMP comparisons. So if you are, that
> would be of particular interest to me.  I think part of the reason that
> optimizing our sparse implementation hasn't received more attention is due
> to a lack of having written such benchmarks in Chapel.
>

I am not a huge fan of HPCG, but that might be a reasonable target for a
Chapel port.  The HPCG reference implementation supports MPI+OpenMP.

HPGMG is a more application-realistic benchmark that HPCG that would be
worthy of a Chapel port.

Best,

Jeff


-- 
Jeff Hammond
[email protected]
http://jeffhammond.github.io/
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Chapel-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-bugs

Reply via email to