I finally got time to watch the video.
In Arrow talks, I've been selling a similar idea of universal and very
efficient UDFs that would use arrow as an interchange. One killer
application we talked about with Wes is to have PySpark arrow-enabled.
I suspect that the Weld representation would not be very different from
Arrow and it's quite possible the efficient operations they built could be
adapted for it. I guess we'll know in January.
Julian, what you describe makes sense to me. I had been plying with code
gen some time ago [1] and I'm wondering what vectorized operations we could
add besides expression eval.
[1] https://github.com/julienledem/brennus

On Mon, Nov 21, 2016 at 4:34 PM, Hanifi GUNES <hanifigu...@gmail.com> wrote:

> Looks interesting. I see some commonalities. I hope the original work (in
> progress?) will make references to Arrow so that we will all know the
> distinguishing points better.
>
> 2016-11-20 8:31 GMT-08:00 Donald E. Foss <donald.f...@gmail.com>:
>
> > Thanks Julian. Sounds worth a listen.
> >
> > Donald E. Foss (mobile-US ET)
> >
> > > On Nov 19, 2016, at 1:48 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > > Matei Zaharia just spoke at the AMPlab seminar [1], and showed a couple
> > of slides about Weld. In the video of the day [2], his talk starts at
> > 4:05:00, and he starts talking about Weld at 4:28:30.
> > >
> > > The essence is an intermediate language for row-level expressions, with
> > the ability to do limited iteration, with the goal of making it easier to
> > pass data between UDFs written in different languages. Sounds familiar? I
> > would presume that an implementation of the language would be strongly
> tied
> > to a memory format. Or maybe it allows multiple possible implementations,
> > one of which would be Arrow in Java.
> > >
> > > The slide listed Pandas as one of the supported front ends, so I
> > wondered if Wes knew something about the project.
> > >
> > > I have been thinking of doing something similar in the Calcite / Drill
> /
> > Arrow world. In Calcite we have RexNodes as an expression language, and
> we
> > have a Java code generator that can target data represented as Java
> arrays,
> > and another variant that can target data represented as Java structs.
> Drill
> > of course has a code generator that can target data in Arrow. I have been
> > thinking for a while of abstracting the code generators so that the
> person
> > implementing, say, the Filter+Project for “select x + y … where x > 5”
> > doesn’t have to get their hands dirty with code generation. There are a
> lot
> > of optimizations to be done, e.g. remembering that you’ve already made
> sure
> > that x is not null.
> > >
> > > Julian
> > >
> > > [1] https://amplab.cs.berkeley.edu/endofproject/ <
> > https://amplab.cs.berkeley.edu/endofproject/>
> > >
> > > [2] https://youtu.be/KAacs9jYPHU <https://youtu.be/KAacs9jYPHU>
> > >
> > >
> > >
> > >> On Nov 19, 2016, at 4:31 AM, Donald Foss <donald.f...@gmail.com>
> wrote:
> > >>
> > >> Did you find that at https://cs.stanford.edu/~matei/? <
> > https://cs.stanford.edu/~matei/?>  That’s the only thing I can find via
> > Google about it.  Do you have more detail or a link to the paper
> itself?  I
> > get the feeling that it is not yet fully complete despite 21 November
> > camera-ready CIDR 2017 deadline.
> > >>
> > >> For those who aren’t familiar with CIDR, it is a conference that
> occurs
> > every other year.  This year’s agenda/program may be found at
> > http://cidrdb.org/cidr2017/program.html <http://cidrdb.org/cidr2017/
> > program.html>.  CIDR is not an acronym for network subnet masks—the first
> > thing I thought of, Classless Inter Domain Routing, but Conference on
> > Innovative Data Systems Research, which focuses primarily on systems.  I
> > hate to admit this, but I’m unfamiliar with the conference, however that
> > appears that it is because I’ve been out of academia for far too long,
> and
> > this conference seems to be the presentation of quite a few interesting
> > papers.  Just judging by title, a poor, yet humorous judge indeed, I
> like:
> > >> - “Dependency-Driven Analytics: A Compass for Uncharted Data Oceans”
> > (Donald - Why just data lakes when you can have data oceans?)
> > >> - “My Weak Consistency is Strong” (Donald - Great title, reminds me of
> > Star Wars and the “Force”)
> > >> - “SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale
> > Machine Learning” (Donald - Another brilliant backronym.)
> > >>
> > >> The Weld paper is the last paper to be presented on 10 January 2017
> > between 2:30 and 4:05 (UTC-8).
> > >>
> > >> On a side note, looking down that page a little, I love the title of
> > the last paper in 2016, Yggdrasil: An Optimized System for Training Deep
> > Decision Trees at Scale <https://cs.stanford.edu/~
> matei/papers/2016/nips_
> > yggdrasil.pdf>.  When I see Yggdrasil, the first thing I think of is a
> > really big tree and Norse mythology.  It’s a great name.  I’m going to
> read
> > some of his other papers this weekend.
> > >>
> > >> Donald Foss
> > >> donald.f...@gmail.com
> > >> ------ __o
> > >> ----_`\<,_
> > >> ---(_)/ (_)
> > >>
> > >> The information in this email is confidential and may be legally
> > privileged. It is intended solely for the addressee. Access to this
> e-mail
> > by anyone else is unauthorized.
> > >>
> > >>> On Nov 18, 2016, at 4:42 PM, Julian Hyde <jh...@apache.org> wrote:
> > >>>
> > >>> Anyone know anything about Matei Zaharia’s Weld project?
> > >>>
> > >>>    • S. Palkar, J. Thomas, A. Shanbhag, H. Pirk, M. Schwarzkopf, S.
> > Amarasinghe and M. Zaharia. Weld: A Common Runtime for High Performance
> > Data Analytics, to appear at CIDR 2017.
> > >>>
> > >>> It seems to have similar goals to Arrow.
> > >>>
> > >>> Julian
> > >>>
> > >>
> > >
> >
>



-- 
Julien

Reply via email to