The plan was to have the scan operator do that kind of caching, but I agree
it could make sense to have some common caching framework in case other
scan operators want to cache as well.

On Sun, Sep 16, 2012 at 5:29 PM, moon soo Lee <m...@nflabs.com> wrote:

> Drill want In-place processing ([1], page 12). yes, ETL is painful.
> In my understanding, In-place processing means the data is not always
> columnar.
>
> [2], Figure 10, shows performance difference between columnar and
> record-oriented (MR)
> if Dremel work with record-oriented data, I can guess that'll be order of
> magnitude slower.
>
> If it's true, will this still interactive?
>
> And can anyone give an more detail about "Adaptively convert storage layout
> into more efficient forms", [1], page 12 ?
> Is it kind of transparent columnar format caching?
>
> And if non-columnar data expected in many cases,
> then how about drill have common cache for storage interface instead of
> each scanner implements their own caching policies?
>
> Thanks.
>
> [1] Apache Drill, Architecture outlines.
> http://www.slideshare.net/jasonfrantz/drill-architecture-20120913
> [2] Dremel: Interactive Analysis of Web-Scale Datasets
>
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36632.pdf
>



-- 
Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

Reply via email to