Ah - you guys are way too self-energetic - you need to cheat sometimes!
If you were to create a dataset in AsterixDB and run exactly that query
on it, you can view the optimized query plan and the Hyracks job (and
its location constraints). That way you can (for future reference) see
what connectors, operators, etc., are used there and how. (All the
machinery for what you are doing is definitely there for you, as
AsterixDB does exactly that when processing that query. :-))
Cheers,
Mike
On 2/21/18 2:00 PM, Ahmed Eldawy wrote:
Here's some context about this problem. We are trying to build a simple
GroupBy-Aggregate function using Hyracks. Think about the following SQL
query
SELECT id, COUNT(id) FROM dataset GROUP BY id;
Our design has two operators, local aggregator and global aggregator.
The local aggregator processes one input split at a time and computes its
histogram in the form of <ID, count>.
The global aggregator combines all the pairs <ID, count> produced by the
local aggregators to produce the final output as <ID, Sum(count)>
The question is which type of connector we should use to connect the local
aggregator to the global aggregator? While we know that an MtoN hash
connector will work, where each machine combines a subset of the keys, our
design is to combine all of them in a single machine. In other words, there
has to be only one instance of the global aggregator running on a single
machine.
On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique <
[email protected]> wrote:
Hi,
I am trying to code a very simple example that can compute a single
histogram from two different files. I am able to compute separate
histograms to for each file using OneToOneConnectorDescriptor. Now, I want
to combine these two maps into one map. I could not find any
MToOneConnector, where I can combine these two maps into one. Can somebody
please guide me how to do in a correct way?
What I did:
1. Created two splits for input files
2. Connected input to myOperatorDescriptor using
OneToOneConnectorDescriptor
3. Connected myOperatorDescriptor to the output using
OneToOneConnectorDescriptor
4. myOperatorDescriptor is reading the files and computing the histogram
(in HashMap) for each file
What I need to do:
1. Combine the maps into one.