also, if something is not supported, such as your example, (if it is not supported), optimizer would simply state so with rejection. But if it takes it in, then I am pretty sure it will do the right job (or at least there's a unit test for that case that is asserted on a trivial example).
Here, by trivial i mean local pipelines for 2-split inputs, that's the general rule i used. On Wed, Jun 18, 2014 at 6:26 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > a little bit of additional information is that for rewriting rules stage > optimizer does 3 passes over semantic tree, each pass matching a tree > fragment using Scala case class matching and rewriting. This allows to > match and rewrite pretty elaborate tree structure fragments, although at > the moment i don't think we dig farther than immediate children, and > perhaps some their known attributes, in most cases. > > More detailed description that that i think is only in reading the source. > > > On Wed, Jun 18, 2014 at 6:19 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > >> E.g. i know for sure A %.% B is legal where A is string-keyed and b is >> int-keyed. >> >> This is kind of not the point. the point is that you can easily modify >> rewriting rules and operators to cover misses. (there shouldn't be many, >> since we've already written quite a bit of expressions out there). >> >> >> On Wed, Jun 18, 2014 at 6:15 PM, Dmitriy Lyubimov <dlie...@gmail.com> >> wrote: >> >>> I am not sure. There are more rewriting rules than i can remember, and i >>> did not write an algorithm ( i think) that would involve this combination. >>> I guess the best thing is to try in a shell or a unit test. if it falls >>> thru, perhaps a new plan element needs to be added (although I am not very >>> sure there isn't already). I know that there are join-based multiplicative >>> operators there. >>> >>> >>> On Wed, Jun 18, 2014 at 6:11 PM, Ted Dunning <ted.dunn...@gmail.com> >>> wrote: >>> >>>> On Wed, Jun 18, 2014 at 6:07 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>>> wrote: >>>> >>>> > in simple terms, if non-integer row keying is used anywhere, it tries >>>> to >>>> > rewrite pipelines so that product orientations never require non-int >>>> keys >>>> > to denote columns. In case pipeline makes it impossible, optimizer >>>> will >>>> > refuse to produce a plan. >>>> > >>>> > e.g. suppose A is distributed string-keyed. >>>> > >>>> > (A.t %.% A) collect // ok >>>> > >>>> >>>> What happens with the important case of B.t %.% A where both A and B >>>> are >>>> string keyed? >>>> >>> >>> >> >