Re: Use spark as Calcite execution engine

JD Zheng Tue, 20 Jun 2017 16:35:10 -0700

Hi, Vladimir,

 
> On Jun 20, 2017, at 2:29 PM, Vladimir Sitnikov <[email protected]> 
> wrote:
> 
> JD>we still face the join problem though
> 
> Can you please clarify what is the typical dataset you are trying to join
> (in the number of rows/bytes)?


Our dataset size varies, can go up to 100+GB. Since druid does not support 
join, we definitely can not push down the join. If we do the join in calcite, 
then the query is limited by the one host memory. That’s why I am thinking of 
using SPARK engine to address this concern.

> Am I right you somehow struggle with "fetch everything from Druid and join
> via Enumerable" and you are completely fine with "fetch everything from
> Druid and join via Spark”?
I am trying to see if we could do something similar as Hive does, directly pull 
Druid segments and do the rest in spark.

> I'm not sure Spark itself would make things way faster.
> 
At least won’t have the one host memory bound issue.

> Could you share some queries along with dataset sizes and expected/actual
> execution plans?
> 

> Vladimir

Re: Use spark as Calcite execution engine

Reply via email to