Here's some context about this problem. We are trying to build a simple
GroupBy-Aggregate function using Hyracks. Think about the following SQL
query
SELECT id, COUNT(id) FROM dataset GROUP BY id;
Our design has two operators, local aggregator and global aggregator.
The local aggregator processes one input split at a time and computes its
histogram in the form of <ID, count>.
The global aggregator combines all the pairs <ID, count> produced by the
local aggregators to produce the final output as <ID, Sum(count)>
The question is which type of connector we should use to connect the local
aggregator to the global aggregator? While we know that an MtoN hash
connector will work, where each machine combines a subset of the keys, our
design is to combine all of them in a single machine. In other words, there
has to be only one instance of the global aggregator running on a single
machine.


On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique <
[email protected]> wrote:

> Hi,
> I am trying to code a very simple example that can compute a single
> histogram from two different files. I am able to compute separate
> histograms to for each file using  OneToOneConnectorDescriptor. Now, I want
> to combine these two maps into one map. I could not find any
> MToOneConnector, where I can combine these two maps into one. Can somebody
> please guide me how to do in a correct way?
> What I did:
> 1. Created two splits for input files
> 2. Connected input to myOperatorDescriptor using
> OneToOneConnectorDescriptor
> 3. Connected myOperatorDescriptor to the output using
> OneToOneConnectorDescriptor
> 4. myOperatorDescriptor is reading the files and computing the histogram
> (in HashMap) for each file
> What I need to do:
> 1. Combine the maps into one.
>



-- 

Ahmed Eldawy
Assistant Professor
http://www.cs.ucr.edu/~eldawy
Tel: +1 (951) 827-5654

Reply via email to