One of the tricky things to figure out is how to separate
statistics from machine learning, as they overlap heavily
(completely?) but with different terminology and goals. I think
it's really important that JuliaStats and JuliaML/JuliaLearn play
nicely together, and this probably means that any ML interface
uses StatsBase verbs whenever possible. There has been a little
tension (from my perspective) and a slight turf war when it comes
to statistical processes and terminology... is it possible to avoid?
On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski
<[email protected] <mailto:[email protected]>> wrote:
This is definitely already in progress, but we've a ways to
go before it's as easy as scikit-learn. I suspect that having
an organization will be more effective at coordinating the
various efforts than people might expect.
On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff <[email protected]
<mailto:[email protected]>> wrote:
Randy, see LearnBase.jl, MachineLearning.jl, Learn.jl
(just a readme for now), Orchestra.jl, and many others.
Many people have the same goal, and wrapping TensorFlow
isn't going to change the need for a high level
interface. I do agree that a good high level interface
is higher on the priority list, though.
On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch
<[email protected]
<mailto:[email protected]>> wrote:
Sure. I'm not against anyone doing anything, just
that it seems like Julia suffers from an "expert/edge
case" problem right now. For me, it'd be awesome if
there was just a scikit-learn (Python) or caret (R)
type mega-interface that ties together the packages
that are already coded together. From my cursory
reading, it seems like TensorFlow is more like a
low-level toolkit for expressing/solving equations,
where I see Julia lacking an easy method to evaluate
3-5 different algorithms on the same dataset quickly.
A tweet I just saw sums it up pretty succinctly:
"TensorFlow already has more stars than scikit-learn,
and probably more stars than people actually doing
deep learning"
On Tuesday, November 10, 2015 at 11:28:32 PM UTC-5,
Alireza Nejati wrote:
Randy: To answer your question, I'd reckon that
the two major gaps in julia that TensorFlow could
fill are:
1. Lack of automatic differentiation on arbitrary
graph structures.
2. Lack of ability to map computations across
cpus and clusters.
Funny enough, I was thinking about (1) for the
past few weeks and I think I have an idea about
how to accomplish it using existing JuliaDiff
libraries. About (2), I have no idea, and that's
probably going to be the most important aspect of
TensorFlow moving forward (and also probably the
hardest to implement). So for the time being, I
think it's definitely worthwhile to just have an
interface to TensorFlow. There are a few ways
this could be done. Some ways that I can think of:
1. Just tell people to use PyCall directly. Not
an elegant solution.
2. A more julia-integrated interface /a la/ SymPy.
3. Using TensorFlow as the 'backend' of a novel
julia-based machine learning library. In this
scenario, everything would be in julia, and
TensorFlow would only be used to map computations
to hardware.
I think 3 is the most attractive option, but also
probably the hardest to do.