[Chapel-developers] Array.append

Michael Ferguson Thu, 23 Jan 2014 08:04:30 -0800

Hi -

Right now, 1-D rectangular and associative arrays in Chapel do not
support appending directly. Instead, they only support resizing by
modifying the domain. This is a discussion of that design.


Recall that in order to prevent confusion/surprise, an array's domain
can only be modified if the domain is used directly (instead of being
accessed through array.domain). This is described in the spec, section
20.11.

I'd really like to be able to say something like A.append("Hello"),
when A is a 1-D rectangular array, and A.append(key, value) (for an
associative array). This kind of array flexibility is one of the main
productivity advantages Perl or Python have over C. Then,  I'd like
the same array.append(...) to work with distributed arrays and to
include communication optimizations that bundle appends.

Anyway, back to the 1-D local rectangular case. There are two problems
right now preventing total append satisfaction:
  1) Using the domain to resize and then add an element to an array is not
     amortized O(1) - that is, it reallocates and copies the entire array;
     so building up an array in this manner would be O(n*n).
  2) If you want to allow an array to be appended to, you have to pass
     around the domain for that array separately. You have to write:
       D = D.low..D.high+1
       A[D.high+1] = newElement;
     or for an associative array
       D += newKey;
       A[newKey] = newValue;
     and even if we wanted to provide a .append method, that method
     would need to also take in the array's domain, like this:
      proc Array.append(D:domain, newElement) {
        ... the code above...
      }

Now, modules/standard/Containers.chpl includes Vector, which
stores its array and domain separately and does the usual array-doubling
so that appends are amortized O(1). The problems with this approach are:
  A) Chapel is carefully constructed to include a rich vocabulary of operations
     on arrays. These operations (like slicing, promotion, zippering,
     parallel iteration) are not available on a Vector (even though they have
     the same data representation).
  B) Routines that work with 1-D arrays now might have
     an obligation to also work with a Vector. For example, what if I want
     to sort my Vector that I built up one element at a time? Now I need
     the Sort module to support the Vector type, even though the Sort
     module is already generic with respect to the array representation.
     Chapel's strong generic programming mechanisms should take care of
     this problem - instead of forcing developers to create two or more
     versions of every routine (or adapter classes like you do in C++).

Problems A and B lead me to believe that Chapel needs to support appending
directly on some 1-D array representation. It's theoretically possible
to add the append method described above (which takes in as input
the array, the domain, and the new element) and to fix the domains to resize
with amortized O(1) time. However, it's still unnecessarily complicated
to keep track all of the time of both the array and the domain, and working
through the domain obfuscates the operation as something optimize-able to
the library (ie, it would be difficult to write append-bundling communication
optimizations if the domain/array resize and the storing of the new element
are separate steps - in fact, any approach that requires we know the index
where the appended element will be stored will break the append-bundling).

So, I think that we need to change the rule about arrays not being resizeable
unless their domain is explicitly used to do so. I can think of the following
options:
  1) Let programmers resize Array.domain willy-nilly. It's arguably not the
     language's job to protect programmers from themselves.
  2) Relax the rule in cases where the domain is only growing. That is, allow
     multiple arrays sharing a domain to change size if an Array.domain is 
modified
     but only if all of the old elements remain.
  3) Make it a runtime error for domains that are used by multiple arrays to 
change
     when accessed through an Array.domain.
     I believe this would be straightforward to implement currently using
     the domain's reference counts. This would change the error from a 
compile-time
     one to a run-time one.
  4) When Array.domain changes, the array gets a private copy of the modified
     domain, so that the new domain applies to the array and not other arrays 
that
     were declared over the same domain. Other arrays initially declared to 
share
     the same domain would no longer share the domain.
  5) Create some sort of type annotation which indicates that an array owns its
     domain and that the domain cannot be shared with other arrays. Arrays 
declared
     in this manner would get their own copy of a domain. For example, you might
     declare such an array like this:
        var A:[owns 1..100] int; // owns, in, var, private are potential 
keywords
     and declare a procedure that needs such an array like this:
        proc doThingsInvolvingAppending(A:[owns ?D]) { ... }

Would these choices remove essential optimization opportunities?

Chapel is carefully designed to allow some optimizations over arrays and 
domains.
In particular, when multiple arrays are declared over the same domain, we know
that e.g. a forall loop zippering those arrays needs no communication. However,
this is more of a fact of how it's built than a compiler 
analysis/optimization...
although the compiler could avoid emitting communication calls in this case
(saving some overhead of going through e.g. comm_get when the data is actually 
local).

Section 20.11 of the spec describes that constant domains might allow the 
compiler to
optimize further; for example to avoid bounds checks, but I don't personally 
believe
that's an important point (since --fast just turns off all the bounds checks 
anyway).

My feeling on these points is that the compiler should have an 
arrays-domain-is-modified
analysis, and that such an analysis would provide more optimization 
opportunities
while allowing choice 1,2, or 4 from being reasonable. There are other 
optimizations
that the compiler can apply in regions
where it can be proved that the array's domain is not modified, including any
optimization that keeps a pointer to an array element. One such optimization is
strength-reduction - we could convert 1-D local array accesses to in this loop
  for i in 1..100 {
    A[i] = i;
  }
to a pointer that we increment, like
  for ptr in A.base..A.base+A.len-1 by sizeof(A.eltType) { ... }.

But if the loop body contains a modification of A's domain, that optimization
cannot apply. Similarly, if the domain is not modified, the domain can
be cached across locales, the array base pointer can be cached across locales,
and the addresses of individual array elements can be cached across locales;
for example in
  for i in 1..100 {
    A[x] = i * A[x];
  }
the compiler could hoist the computation of the (wide) address of A[x] out of
the loop, since it is guaranteed to not change as long as the array's domain
does not change.

As for the many-arrays-sharing-the-same-domain optimization, it seems to me
that it can be totally implemented at run-time in the implementation of the
follower iterators (ie, the follower iterator would check if it can guarantee
that what it is iterating over has the same distribution as what the leader
is iterating over, and in that case, omit the communication calls). So I'm
not totally convinced that it's necessary for the compiler to always know
about these arrays-sharing-domains cases - and I think it might be OK
to restrict the compilers knowledge to
arrays-sharing-domains-that-aren't-modified-through-the-array.


Thanks for reading. Thoughts?

-michael

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

[Chapel-developers] Array.append

Reply via email to