You might start with

https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/data-stream-mgr.h
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/data-stream-sender.h
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/data-stream-recvr.h
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/exchange-node.h

"Volcano : an extensible and parallel query evaluation system":
http://digitalcommons.ohsu.edu/cgi/viewcontent.cgi?article=1191&context=csetech

"Impala: A Modern, Open-Source SQL Engine for Hadoop":
http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf ,
http://www.cidrdb.org/cidr2015/Slides/28_CIDR15_Slides_Paper28.pdf

Speaking for myself, I would like to see and understand more about your
multi-query modifications (design documents, benchmarks, code). This will
affect how I feel about (a) How Impala benefits and (b) whether any changes
are sufficiently risky to justify separate branching



On Thu, Mar 17, 2016 at 5:29 AM, Jim Apple <[email protected]> wrote:

> +cc:[email protected]
>
> On Wed, Mar 16, 2016 at 10:38 PM, 林言 <[email protected]> wrote:
>
>> We know that each planfragment has only one destination node in Impala.
>> Now we want to send the intermidiate results of this fragment to more than
>> one destination node. But we're only familiar with the data structure and
>> execution flow in the frontend. So we wonder where we should modify in
>> the thrift and backend to make it work.
>> Can you share some design document? So we can know more design details of
>> Impala.
>> If you are interested in multi-query adaption in Impala, would you like
>> to work with us in a new branch of Impala?
>>
>>
>> ------------------------------
>> Yan Lin
>>
>>
>> *From:* Jim Apple <[email protected]>
>> *Date:* 2016-03-17 01:06
>> *To:* Impala Dev <[email protected]>
>> *CC:* bbbbaai <[email protected]>
>> *Subject:* Re: About Cooperating For A Better Impala
>> I'm sure everyone will be delighted to have more communication and
>> cooperation, including reading the papers and the code. Can you share those
>> today, or is that part of the "puzzle" of "sharing intermediate results"?
>> Is there anything we can do to help with your puzzlement?
>>
>> On Wednesday, March 16, 2016 at 12:11:05 AM UTC-7, 林言 wrote:
>>>
>>> Dear Sir/Madam:
>>>             Hello! I am Yan Lin, a master candidate in ZheJiang
>>> University(CHN) in laboratory "PCL" (http://percom.zju.edu.cn/). Our
>>> lab has done many works on Impala, as follows:
>>>                 1. We proposed an Impala query optimization method
>>> based on bushy-tree and an IMPROVED-MCCHYP algorithm [1]. And we
>>> implemented our method and algorithm in Impala.
>>>                 2. We proposed a replication-selection based scheduling
>>> algorithm and implemented it in Impala [2].
>>>                 3. Some of my fellows are now developing a simulator of
>>> Impala called ImpalaSim and writing the corresponding paper [3].
>>>             Recently, we put our focus on multi-query optimization which 
>>> sufficiently exploits
>>> common sub-expressions of batched queries and improves the efficiency. We
>>> have modified some source code, and the modified Impala can already
>>> execute multiple queries in the same query context. But we still feel
>>> puzzled with sharing intermediate results. We hope for more
>>> communication and cooperation in every aspect. We all want a better
>>> Impala.
>>>             Thank you for your attention! Hope to hear from you soon!
>>>
>>>
>>>                       Yours Sincerely,
>>>
>>>
>>>                               Yan Lin
>>>
>>>             Reference:
>>>             [1]
>>> Bushy Tree and Improved-McCHyp Algorithm Based Impala Query Optimization
>>>             [2]
>>> Replication-Selection based Scheduling for Impala Parallel Query Execution
>>>             [3] ImpalaSim:Discrete Event Simulation Platform for Impala
>>> System
>>> ------------------------------
>>> Yan Lin
>>>
>>
>

Reply via email to