Hi Camuel, I think that it may be not an upfront formalization if we focus on a logical plan model. Since a logical plan model is generally equivalent to an algebra, we don't need to concern physical execution methods (i.e., iteration models over nested data set).
-- Hyunsik Choi On Mon, Aug 27, 2012 at 7:07 PM, Camuel Gilyadov <[email protected]> wrote: > Of course optimizer must work with some intermediate form of query, which I > think could be an object graph expressed for now by using ordinary > programming language objects without much fuss (like Java or Python). > However, I think its upfront formalization is going to develop into > never-ending story and mainly because iteration over nested datasets is far > from being clear, different nested dataset languages are not only differs > in syntax but also in the iteration model itself. I think Rob > Grzywinski started this discussion in separate thread already... > > Also regarding optimizer and DAG, as there is not much index/joins action > going on what we have now is mostly a chain of transformations > which of-course is also formally a DAG :) but thinking about it as a chain > will simplify things at first, there is no much optimizations you can do > with it. If you go one level down and consider scalar operations then you > get more elaborate DAG of course. > ----------------- > Let's separate issues: > > 1. Query Plan is distributed workload and that must be formalized and I > think no one suggests otherwise. Also no one suggest other model than DAG > except me. I suggest unrestricted graph just to keep backend useful for > other stuff, for the purposes of DrQL DAG is more than adequate in my > opinion. However, this is a DAG of identical nodes, and it follows physical > data partitioning. Let's label it physical DAG in order not to confuse with > logical query plan. > > 2. Open and somewhat confused issue is what actually runs a single node of > the above mentioned physical DAG? Is this a formalized query plan or just > arbitrary code? > > 3. Query plan formalization: main obstacle here is that the model of > iterating nested datasets are far from clear. Particularly nor Dremel paper > neither BigQuery reference describe well the behavior of querying nested > datasets with all different subcases. There many other languages to query > nested data but the iteration model varies significantly between them. For > formalization we miss one another academic paper which > would rigorously define canonical high-performance iteration model for > nested datasets. > > 4. Another complicating factor is columnar optimization. Drill is going to > be nested-columnar engine and as such part of query plan must be columnar. > So full set of column-oriented and record-oriented primitives are needed > record-construction primitives. > > > On Mon, Aug 27, 2012 at 9:29 AM, Hyunsik Choi <[email protected] > >wrote: > > > Hi David, > > > > I agree with some of your claims. I also think that now DrQL may be > enough > > to Drill project. > > > > Even if we don't support various query languages, I think complex query > > languages (like SQL and DrQL) should have an logical form in order to > deal > > a given query without considering actual physical information. It > provides > > an easy way to modify the query to be more optimized one (e.g., pushing > > down projection, selection, and finding the best operator order) while > the > > optimized one is logically equivalent to the original query. > > > > Also, It would not hurt performance. For example, OLTP that processes a > > query within a few milliseconds already employs such a logical plan > > model. Although a logical plan is generic, it is not hugely different to > > existing logical plan models. > > > > -- > > Hyunsik Choi > > > > On Mon, Aug 27, 2012 at 2:34 PM, David Gruzman <[email protected] > > >wrote: > > > > > Hi, > > > Dremel is high performance system. I think building something generic > > > "inter-languages" will hurt performance. > > > Having generic executor service we can add several different paradigms > of > > > the local computation (and even not local). But I think > > > SQL like query language should be done in most efficient way. > > > David > > > > > > On Mon, Aug 27, 2012 at 3:20 AM, Hyunsik Choi <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > How about having a generic logical plan described as a DAG, where > each > > > > vertex indicates a logical operator including various annotations and > > > each > > > > edge represents a data flow. A DAG has much expressive power. Many > > > > literatures have shown that most logical plans of various data > > > manipulation > > > > languages can be described as such a DAG. > > > > > > > > Additional languages have different ASTs, and they can be transformed > > > into > > > > the generic logical plan. In this case, we can reuse logical plan, > > > logical > > > > plan optimization, and physical execution plan. Besides, Drill may > > > consider > > > > a global plan that represents the distributed execution plan. Since > > the > > > > global plan generally depends on the logical plan, we can also reuse > > all > > > > code related to the global plan. > > > > > > > > -- > > > > Hyunsik Choi > > > > > > > > > > > > On Mon, Aug 27, 2012 at 6:22 AM, Ted Dunning <[email protected]> > > > > wrote: > > > > > > > > > Camuel, > > > > > > > > > > Do you have a grammar test suite that demonstrates the range of > > > > > expressions? > > > > > > > > > > Also, I believe that some have a goal to use additional languages > > > besides > > > > > SQL like languages. A limited version of pig, for instance, would > be > > > > very > > > > > interesting. To do this, it will be important to have a logical > plan > > > > > structure that is common for different syntaxes and is not limited > to > > > the > > > > > idiosyncracies of any particular syntax. > > > > > > > > > > How do you think that should be handled? Do you have an idea for a > > > > logical > > > > > plan structure? > > > > > > > > > > On Sun, Aug 26, 2012 at 4:11 PM, Camuel Gilyadov <[email protected] > > > > > > wrote: > > > > > > > > > > > I've written and attached ANTLR grammar for DrQL which I assume > is > > > same > > > > > as > > > > > > BigQuery language described in Query Reference on BigQuery > website. > > > > This > > > > > > grammar includes AST production rules. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
