One of the tricky things to figure out is how to separate statistics from machine learning, as they overlap heavily (completely?) but with different terminology and goals. I think it's really important that JuliaStats and JuliaML/JuliaLearn play nicely together, and this probably means that any ML interface uses StatsBase verbs whenever possible. There has been a little tension (from my perspective) and a slight turf war when it comes to statistical processes and terminology... is it possible to avoid?
On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski <[email protected]> wrote: > This is definitely already in progress, but we've a ways to go before it's > as easy as scikit-learn. I suspect that having an organization will be more > effective at coordinating the various efforts than people might expect. > > On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff <[email protected]> wrote: > >> Randy, see LearnBase.jl, MachineLearning.jl, Learn.jl (just a readme for >> now), Orchestra.jl, and many others. Many people have the same goal, and >> wrapping TensorFlow isn't going to change the need for a high level >> interface. I do agree that a good high level interface is higher on the >> priority list, though. >> >> On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch < >> [email protected]> wrote: >> >>> Sure. I'm not against anyone doing anything, just that it seems like >>> Julia suffers from an "expert/edge case" problem right now. For me, it'd be >>> awesome if there was just a scikit-learn (Python) or caret (R) type >>> mega-interface that ties together the packages that are already coded >>> together. From my cursory reading, it seems like TensorFlow is more like a >>> low-level toolkit for expressing/solving equations, where I see Julia >>> lacking an easy method to evaluate 3-5 different algorithms on the same >>> dataset quickly. >>> >>> A tweet I just saw sums it up pretty succinctly: "TensorFlow already has >>> more stars than scikit-learn, and probably more stars than people actually >>> doing deep learning" >>> >>> >>> >>> On Tuesday, November 10, 2015 at 11:28:32 PM UTC-5, Alireza Nejati wrote: >>>> >>>> Randy: To answer your question, I'd reckon that the two major gaps in >>>> julia that TensorFlow could fill are: >>>> >>>> 1. Lack of automatic differentiation on arbitrary graph structures. >>>> 2. Lack of ability to map computations across cpus and clusters. >>>> >>>> Funny enough, I was thinking about (1) for the past few weeks and I >>>> think I have an idea about how to accomplish it using existing JuliaDiff >>>> libraries. About (2), I have no idea, and that's probably going to be the >>>> most important aspect of TensorFlow moving forward (and also probably the >>>> hardest to implement). So for the time being, I think it's definitely >>>> worthwhile to just have an interface to TensorFlow. There are a few ways >>>> this could be done. Some ways that I can think of: >>>> >>>> 1. Just tell people to use PyCall directly. Not an elegant solution. >>>> 2. A more julia-integrated interface *a la* SymPy. >>>> 3. Using TensorFlow as the 'backend' of a novel julia-based machine >>>> learning library. In this scenario, everything would be in julia, and >>>> TensorFlow would only be used to map computations to hardware. >>>> >>>> I think 3 is the most attractive option, but also probably the hardest >>>> to do. >>>> >>> >> >
