Re: [OT] n-way union

Don Tue, 26 May 2009 03:20:13 -0700

BCS wrote:

Hello Andrei,

This is somewhat OT but I think it's an interesting problem. Consider
the following data:

double[][] a =
[
[ 1, 4, 7, 8 ],
[ 1, 7 ],
[ 1, 7, 8],
[ 4 ],
[ 7 ],
];
We want to compute an n-way union, i.e., efficiently span all elements
in all arrays in a, in sorted order. You can assume that each
individual array in a is sorted. The output of n-way union should be:

auto witness = [
1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8
];
assert(equal(nWayUnion(a), witness[]));

The STL and std.algorithm have set_union that does that for two sets:
for example, set_union(a[0], a[1]) outputs [ 1, 1, 4, 7, 7, 8 ]. But
n-way unions poses additional challenges. What would be a fast
algorithm? (Imagine a could be a very large range of ranges).

Needless to say, nWayUnion is a range :o).

Finally, why would anyone care for something like this?

Andrei


in sudocode

1. build a min heap out of the elements based on the first element ofeach array

2. remove the first value from the root of the heap,
3. if empty array, drop it
4. if not empty re-heap
5. goto 2 if anything left

In practice, I bet it'll go faster with a couple more steps:
2. remove the first value (from array X) from the root of the heap,
2a. Determine new root of the heap (don't need to do a full re-heap yet).
2b. move all values from X until empty or >= new root.
4. reheap

There are interesting cases like:
[5,6,7,8,9,9,......9]
[30, 31,32,33,34...38]
[1,1,1,1,1,1,1,1,1,1,...1]
[10,11,12,13,14,15,16,16,......., 16]

where in theory, the total number of comparisons required is a functiononly of the number of arrays, and is independent of the actual number ofelements.If it's common to have runs where consecutive values come from a singlearray, and the arrays are random access, you could do MUCH better than O(n).

Re: [OT] n-way union

Reply via email to