i have the same philosophy: "An enduser should never have to type a unicode character"

On 2015-11-11 17:11, Cedric St-Jean wrote:
scikit-learn uses greek letters in its implementation, which I'm fine with since domain experts work on those, but I wish that in the visible interface they had consistently used more descriptive names (eg. regularization_strength instead of alpha).

On Wednesday, November 11, 2015 at 11:00:56 AM UTC-5, Christof Stocker wrote:

    I understand that. But that would imply that a group of people
    that are used to different notation would need to reach a
    consensus. Also there would be an uglyness to it. For example SVMs
    have a pretty standardized notation for the most things. I think
    it would not help anyone if we would start to change that just to
    make the whole codebase more unified.
    It would also be confusing to newcomers. To me it would make most
    sense if a domain expert has an easy time to see what's going on.
    I think it's unlikely that someone comes along and wants to work
    on 10 packages at the same time. It seems more likely that the
    newcomer wants to work on something from the special domain he/she
    is familiar with.

    On 2015-11-11 16:49, Tom Breloff wrote:

         if you implement some algorithm one should use the notation
        from the referenced paper


    This can be easier to implement (essentially just copy from the
    paper) but will make for a mess and a maintenance nightmare.  I
    don't want to have to read a paper just to understand what
    someone's code is doing.  Not to mention there are lots of
    "unique findings" and algorithms in papers that have actually
    already been found/implemented, but with different terminology in
    a different field.  My research has taken me down lots of rabbit
    holes, and I'm always amazed at how very different
    fields/applications all have the same underlying math.  We should
    do everything we can to unify the algorithms in the most Julian
    way.  It's not always easy, but it should at least be the goal.

    This is most important with terminology and using greek letters.
    I don't want one algorithm to represent a learning rate with eta,
    and another to use alpha.  It may match the paper, but it makes
    for mass confusion when you're not using the paper as a
    reference.  (the obvious solution is to never use greek letters,
    of course)

    On Wed, Nov 11, 2015 at 10:34 AM, Christof Stocker
    <[email protected] <javascript:>> wrote:

        I agree. I personally think the ML efforts should follow the
        StatsBase and Optim conventions where it makes sense.

        The notational differences are inconvenient, but they are
        manageable. I think readability should be the goal there. For
        example if you implement some algorithm one should use the
        notation from the referenced paper. A package tailored
        towards use in a statistical context such as GLMs should
        probably follow the convention used in statistics (e.g. beta
        for the coefficients). A package for SVMs should follow the
        conventions for SVMs (e.g. w for the coefficients) and so
        forth. It's nice to streamline things, but lets not get
        carried away with this kind of micromanagement


        On 2015-11-11 16:01, Tom Breloff wrote:
        One of the tricky things to figure out is how to separate
        statistics from machine learning, as they overlap heavily
        (completely?) but with different terminology and goals.  I
        think it's really important that JuliaStats and
        JuliaML/JuliaLearn play nicely together, and this probably
        means that any ML interface uses StatsBase verbs whenever
        possible.  There has been a little tension (from my
        perspective) and a slight turf war when it comes to
        statistical processes and terminology... is it possible to
        avoid?

        On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski
        <[email protected] <javascript:>> wrote:

            This is definitely already in progress, but we've a ways
            to go before it's as easy as scikit-learn. I suspect
            that having an organization will be more effective at
            coordinating the various efforts than people might expect.

            On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff
            <[email protected] <javascript:>> wrote:

                Randy, see LearnBase.jl, MachineLearning.jl,
                Learn.jl (just a readme for now), Orchestra.jl, and
                many others.  Many people have the same goal, and
                wrapping TensorFlow isn't going to change the need
                for a high level interface.  I do agree that a good
                high level interface is higher on the priority list,
                though.

                On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch
                <[email protected] <javascript:>> wrote:

                    Sure. I'm not against anyone doing anything,
                    just that it seems like Julia suffers from an
                    "expert/edge case" problem right now. For me,
                    it'd be awesome if there was just a scikit-learn
                    (Python) or caret (R) type mega-interface that
                    ties together the packages that are already
                    coded together. From my cursory reading, it
                    seems like TensorFlow is more like a low-level
                    toolkit for expressing/solving equations, where
                    I see Julia lacking an easy method to evaluate
                    3-5 different algorithms on the same dataset
                    quickly.

                    A tweet I just saw sums it up pretty succinctly:
                    "TensorFlow already has more stars than
                    scikit-learn, and probably more stars than
                    people actually doing deep learning"



                    On Tuesday, November 10, 2015 at 11:28:32 PM
                    UTC-5, Alireza Nejati wrote:

                        Randy: To answer your question, I'd reckon
                        that the two major gaps in julia that
                        TensorFlow could fill are:

                        1. Lack of automatic differentiation on
                        arbitrary graph structures.
                        2. Lack of ability to map computations
                        across cpus and clusters.

                        Funny enough, I was thinking about (1) for
                        the past few weeks and I think I have an
                        idea about how to accomplish it using
                        existing JuliaDiff libraries. About (2), I
                        have no idea, and that's probably going to
                        be the most important aspect of TensorFlow
                        moving forward (and also probably the
                        hardest to implement). So for the time
                        being, I think it's definitely worthwhile to
                        just have an interface to TensorFlow. There
                        are a few ways this could be done. Some ways
                        that I can think of:

                        1. Just tell people to use PyCall directly.
                        Not an elegant solution.
                        2. A more julia-integrated interface /a la/
                        SymPy.
                        3. Using TensorFlow as the 'backend' of a
                        novel julia-based machine learning library.
                        In this scenario, everything would be in
                        julia, and TensorFlow would only be used to
                        map computations to hardware.

                        I think 3 is the most attractive option, but
                        also probably the hardest to do.








Reply via email to