[
https://issues.apache.org/jira/browse/TAJO-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861232#comment-13861232
]
Jihoon Son commented on TAJO-472:
---------------------------------
Min,
the intermediate data which I meant is the shuffled(repartitioned) data. We can
easily imagine the case of when we need to cache the shuffled data instead of
the original input table. As you know, the data repartition cost is the one of
the most important factors of the query processing performance. I think that we
can reduce the repartition cost by caching the repartitioned intermediate data.
It looks reasonable on using the md5 match to avoid recompute the cached
results, and I also agree on supporting both ways of the manual caching and the
automatic caching.
Your proposal is very interesting. I'll deeply investigate the proposal.
Thanks!
> Umbrella ticket for accelerating query speed through memory cached table
> ------------------------------------------------------------------------
>
> Key: TAJO-472
> URL: https://issues.apache.org/jira/browse/TAJO-472
> Project: Tajo
> Issue Type: New Feature
> Components: distributed query plan, physical operator
> Reporter: Min Zhou
> Assignee: Min Zhou
> Attachments: TAJO-472 Proposal.pdf
>
>
> Previously, I was involved as a technical expert into an in-memory database
> for on-line businesses in Alibaba group. That's an internal project, which
> can do group by aggregation on billions of rows in less than 1 second.
> I'd like to apply this technology into tajo, make it much faster than it is.
> From some benchmark, we believe that spark&shark currently is the fastest
> solution among all the open source interactive query system , such as impala,
> presto, tajo. The main reason is that it benefit from in-memory data.
> I will take memory cached table as my first step to accelerate query speed
> of tajo. Actually , this is the reason why I concerned at table partition
> during Xmas and new year holidays.
> Will submit a proposal soon.
>
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)