Thanks Julian. Sounds worth a listen. Donald E. Foss (mobile-US ET)
> On Nov 19, 2016, at 1:48 PM, Julian Hyde <jh...@apache.org> wrote: > > Matei Zaharia just spoke at the AMPlab seminar [1], and showed a couple of > slides about Weld. In the video of the day [2], his talk starts at 4:05:00, > and he starts talking about Weld at 4:28:30. > > The essence is an intermediate language for row-level expressions, with the > ability to do limited iteration, with the goal of making it easier to pass > data between UDFs written in different languages. Sounds familiar? I would > presume that an implementation of the language would be strongly tied to a > memory format. Or maybe it allows multiple possible implementations, one of > which would be Arrow in Java. > > The slide listed Pandas as one of the supported front ends, so I wondered if > Wes knew something about the project. > > I have been thinking of doing something similar in the Calcite / Drill / > Arrow world. In Calcite we have RexNodes as an expression language, and we > have a Java code generator that can target data represented as Java arrays, > and another variant that can target data represented as Java structs. Drill > of course has a code generator that can target data in Arrow. I have been > thinking for a while of abstracting the code generators so that the person > implementing, say, the Filter+Project for “select x + y … where x > 5” > doesn’t have to get their hands dirty with code generation. There are a lot > of optimizations to be done, e.g. remembering that you’ve already made sure > that x is not null. > > Julian > > [1] https://amplab.cs.berkeley.edu/endofproject/ > <https://amplab.cs.berkeley.edu/endofproject/> > > [2] https://youtu.be/KAacs9jYPHU <https://youtu.be/KAacs9jYPHU> > > > >> On Nov 19, 2016, at 4:31 AM, Donald Foss <donald.f...@gmail.com> wrote: >> >> Did you find that at https://cs.stanford.edu/~matei/? >> <https://cs.stanford.edu/~matei/?> That’s the only thing I can find via >> Google about it. Do you have more detail or a link to the paper itself? I >> get the feeling that it is not yet fully complete despite 21 November >> camera-ready CIDR 2017 deadline. >> >> For those who aren’t familiar with CIDR, it is a conference that occurs >> every other year. This year’s agenda/program may be found at >> http://cidrdb.org/cidr2017/program.html >> <http://cidrdb.org/cidr2017/program.html>. CIDR is not an acronym for >> network subnet masks—the first thing I thought of, Classless Inter Domain >> Routing, but Conference on Innovative Data Systems Research, which focuses >> primarily on systems. I hate to admit this, but I’m unfamiliar with the >> conference, however that appears that it is because I’ve been out of >> academia for far too long, and this conference seems to be the presentation >> of quite a few interesting papers. Just judging by title, a poor, yet >> humorous judge indeed, I like: >> - “Dependency-Driven Analytics: A Compass for Uncharted Data Oceans” (Donald >> - Why just data lakes when you can have data oceans?) >> - “My Weak Consistency is Strong” (Donald - Great title, reminds me of Star >> Wars and the “Force”) >> - “SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale >> Machine Learning” (Donald - Another brilliant backronym.) >> >> The Weld paper is the last paper to be presented on 10 January 2017 between >> 2:30 and 4:05 (UTC-8). >> >> On a side note, looking down that page a little, I love the title of the >> last paper in 2016, Yggdrasil: An Optimized System for Training Deep >> Decision Trees at Scale >> <https://cs.stanford.edu/~matei/papers/2016/nips_yggdrasil.pdf>. When I see >> Yggdrasil, the first thing I think of is a really big tree and Norse >> mythology. It’s a great name. I’m going to read some of his other papers >> this weekend. >> >> Donald Foss >> donald.f...@gmail.com >> ------ __o >> ----_`\<,_ >> ---(_)/ (_) >> >> The information in this email is confidential and may be legally privileged. >> It is intended solely for the addressee. Access to this e-mail by anyone >> else is unauthorized. >> >>> On Nov 18, 2016, at 4:42 PM, Julian Hyde <jh...@apache.org> wrote: >>> >>> Anyone know anything about Matei Zaharia’s Weld project? >>> >>> • S. Palkar, J. Thomas, A. Shanbhag, H. Pirk, M. Schwarzkopf, S. >>> Amarasinghe and M. Zaharia. Weld: A Common Runtime for High Performance >>> Data Analytics, to appear at CIDR 2017. >>> >>> It seems to have similar goals to Arrow. >>> >>> Julian >>> >> >