Here's some context about this problem. We are trying to build a simple GroupBy-Aggregate function using Hyracks. Think about the following SQL query SELECT id, COUNT(id) FROM dataset GROUP BY id; Our design has two operators, local aggregator and global aggregator. The local aggregator processes one input split at a time and computes its histogram in the form of <ID, count>. The global aggregator combines all the pairs <ID, count> produced by the local aggregators to produce the final output as <ID, Sum(count)> The question is which type of connector we should use to connect the local aggregator to the global aggregator? While we know that an MtoN hash connector will work, where each machine combines a subset of the keys, our design is to combine all of them in a single machine. In other words, there has to be only one instance of the global aggregator running on a single machine.
On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique < [email protected]> wrote: > Hi, > I am trying to code a very simple example that can compute a single > histogram from two different files. I am able to compute separate > histograms to for each file using OneToOneConnectorDescriptor. Now, I want > to combine these two maps into one map. I could not find any > MToOneConnector, where I can combine these two maps into one. Can somebody > please guide me how to do in a correct way? > What I did: > 1. Created two splits for input files > 2. Connected input to myOperatorDescriptor using > OneToOneConnectorDescriptor > 3. Connected myOperatorDescriptor to the output using > OneToOneConnectorDescriptor > 4. myOperatorDescriptor is reading the files and computing the histogram > (in HashMap) for each file > What I need to do: > 1. Combine the maps into one. > -- Ahmed Eldawy Assistant Professor http://www.cs.ucr.edu/~eldawy Tel: +1 (951) 827-5654
