The plan was to have the scan operator do that kind of caching, but I agree it could make sense to have some common caching framework in case other scan operators want to cache as well.
On Sun, Sep 16, 2012 at 5:29 PM, moon soo Lee <m...@nflabs.com> wrote: > Drill want In-place processing ([1], page 12). yes, ETL is painful. > In my understanding, In-place processing means the data is not always > columnar. > > [2], Figure 10, shows performance difference between columnar and > record-oriented (MR) > if Dremel work with record-oriented data, I can guess that'll be order of > magnitude slower. > > If it's true, will this still interactive? > > And can anyone give an more detail about "Adaptively convert storage layout > into more efficient forms", [1], page 12 ? > Is it kind of transparent columnar format caching? > > And if non-columnar data expected in many cases, > then how about drill have common cache for storage interface instead of > each scanner implements their own caching policies? > > Thanks. > > [1] Apache Drill, Architecture outlines. > http://www.slideshare.net/jasonfrantz/drill-architecture-20120913 > [2] Dremel: Interactive Analysis of Web-Scale Datasets > > http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36632.pdf > -- Tomer Shiran Director of Product Management | MapR Technologies | 650-804-8657