scikit-learn uses greek letters in its implementation, which I'm fine with 
since domain experts work on those, but I wish that in the visible 
interface they had consistently used more descriptive names (eg. 
regularization_strength instead of alpha).

On Wednesday, November 11, 2015 at 11:00:56 AM UTC-5, Christof Stocker 
wrote:
>
> I understand that. But that would imply that a group of people that are 
> used to different notation would need to reach a consensus. Also there 
> would be an uglyness to it. For example SVMs have a pretty standardized 
> notation for the most things. I think it would not help anyone if we would 
> start to change that just to make the whole codebase more unified.
> It would also be confusing to newcomers. To me it would make most sense if 
> a domain expert has an easy time to see what's going on. I think it's 
> unlikely that someone comes along and wants to work on 10 packages at the 
> same time. It seems more likely that the newcomer wants to work on 
> something from the special domain he/she is familiar with.
>
> On 2015-11-11 16:49, Tom Breloff wrote:
>
>  if you implement some algorithm one should use the notation from the 
>> referenced paper
>
>
> This can be easier to implement (essentially just copy from the paper) but 
> will make for a mess and a maintenance nightmare.  I don't want to have to 
> read a paper just to understand what someone's code is doing.  Not to 
> mention there are lots of "unique findings" and algorithms in papers that 
> have actually already been found/implemented, but with different 
> terminology in a different field.  My research has taken me down lots of 
> rabbit holes, and I'm always amazed at how very different 
> fields/applications all have the same underlying math.  We should do 
> everything we can to unify the algorithms in the most Julian way.  It's not 
> always easy, but it should at least be the goal.
>
> This is most important with terminology and using greek letters.  I don't 
> want one algorithm to represent a learning rate with eta, and another to 
> use alpha.  It may match the paper, but it makes for mass confusion when 
> you're not using the paper as a reference.  (the obvious solution is to 
> never use greek letters, of course)
>
> On Wed, Nov 11, 2015 at 10:34 AM, Christof Stocker < <javascript:>
> [email protected] <javascript:>> wrote:
>
>> I agree. I personally think the ML efforts should follow the StatsBase 
>> and Optim conventions where it makes sense.
>>
>> The notational differences are inconvenient, but they are manageable. I 
>> think readability should be the goal there. For example if you implement 
>> some algorithm one should use the notation from the referenced paper. A 
>> package tailored towards use in a statistical context such as GLMs should 
>> probably follow the convention used in statistics (e.g. beta for the 
>> coefficients). A package for SVMs should follow the conventions for SVMs 
>> (e.g. w for the coefficients) and so forth. It's nice to streamline things, 
>> but lets not get carried away with this kind of micromanagement 
>>
>>
>> On 2015-11-11 16:01, Tom Breloff wrote:
>>
>> One of the tricky things to figure out is how to separate statistics from 
>> machine learning, as they overlap heavily (completely?) but with different 
>> terminology and goals.  I think it's really important that JuliaStats and 
>> JuliaML/JuliaLearn play nicely together, and this probably means that any 
>> ML interface uses StatsBase verbs whenever possible.  There has been a 
>> little tension (from my perspective) and a slight turf war when it comes to 
>> statistical processes and terminology... is it possible to avoid?
>>
>> On Wed, Nov 11, 2015 at 9:49 AM, Stefan Karpinski < <javascript:>
>> [email protected] <javascript:>> wrote:
>>
>>> This is definitely already in progress, but we've a ways to go before 
>>> it's as easy as scikit-learn. I suspect that having an organization will be 
>>> more effective at coordinating the various efforts than people might expect.
>>>
>>> On Wed, Nov 11, 2015 at 9:46 AM, Tom Breloff < <javascript:>
>>> [email protected] <javascript:>> wrote:
>>>
>>>> Randy, see LearnBase.jl, MachineLearning.jl, Learn.jl (just a readme 
>>>> for now), Orchestra.jl, and many others.  Many people have the same goal, 
>>>> and wrapping TensorFlow isn't going to change the need for a high level 
>>>> interface.  I do agree that a good high level interface is higher on the 
>>>> priority list, though.
>>>>
>>>> On Wed, Nov 11, 2015 at 9:29 AM, Randy Zwitch < <javascript:>
>>>> [email protected] <javascript:>> wrote:
>>>>
>>>>> Sure. I'm not against anyone doing anything, just that it seems like 
>>>>> Julia suffers from an "expert/edge case" problem right now. For me, it'd 
>>>>> be 
>>>>> awesome if there was just a scikit-learn (Python) or caret (R) type 
>>>>> mega-interface that ties together the packages that are already coded 
>>>>> together. From my cursory reading, it seems like TensorFlow is more like 
>>>>> a 
>>>>> low-level toolkit for expressing/solving equations, where I see Julia 
>>>>> lacking an easy method to evaluate 3-5 different algorithms on the same 
>>>>> dataset quickly. 
>>>>>
>>>>> A tweet I just saw sums it up pretty succinctly: "TensorFlow already 
>>>>> has more stars than scikit-learn, and probably more stars than people 
>>>>> actually doing deep learning" 
>>>>>
>>>>>
>>>>>
>>>>> On Tuesday, November 10, 2015 at 11:28:32 PM UTC-5, Alireza Nejati 
>>>>> wrote: 
>>>>>>
>>>>>> Randy: To answer your question, I'd reckon that the two major gaps in 
>>>>>> julia that TensorFlow could fill are: 
>>>>>>
>>>>>> 1. Lack of automatic differentiation on arbitrary graph structures.
>>>>>> 2. Lack of ability to map computations across cpus and clusters.
>>>>>>
>>>>>> Funny enough, I was thinking about (1) for the past few weeks and I 
>>>>>> think I have an idea about how to accomplish it using existing JuliaDiff 
>>>>>> libraries. About (2), I have no idea, and that's probably going to be 
>>>>>> the 
>>>>>> most important aspect of TensorFlow moving forward (and also probably 
>>>>>> the 
>>>>>> hardest to implement). So for the time being, I think it's definitely 
>>>>>> worthwhile to just have an interface to TensorFlow. There are a few ways 
>>>>>> this could be done. Some ways that I can think of:
>>>>>>
>>>>>> 1. Just tell people to use PyCall directly. Not an elegant solution.
>>>>>> 2. A more julia-integrated interface *a la* SymPy.
>>>>>> 3. Using TensorFlow as the 'backend' of a novel julia-based machine 
>>>>>> learning library. In this scenario, everything would be in julia, and 
>>>>>> TensorFlow would only be used to map computations to hardware.
>>>>>>
>>>>>> I think 3 is the most attractive option, but also probably the 
>>>>>> hardest to do.
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

Reply via email to