> it looks to me like it wouldn't work across different architectures. > is that a correct reading? maybe this is a usual restriction in > this kind of library?
Quite right - homogeneous architectures only. Attempting to cope with heterogeneity accounts for some of the complexity (and inefficiency) of some other libraries. I would guess that any decently scalable parallel machine will be single-architecture (but I haven't surveyed the field lately).
