I've thought a few times about reimplementing Stan in Julia. I wonder how 
much of Stan's codebase is about parsing/code-generation (which would be 
drastically simpler in Julia) versus fine-tuning their NUTS sampler. And 
how much of that work about automatic differentiation/code generation could 
be shared with the deep-learning libraries.

On Sunday, December 27, 2015 at 1:28:53 PM UTC-5, Lampkld wrote:
>
> Viral and Symon,
>
> Since you asked, I will write out some rough and probably excessively 
> abstract Ideas that have been floating around in my head below. I don't 
> have time to formally polish, so please forgive the inchoate nature of 
> these thoughts: 
>
> Yes, composability and generality are the names of the game! I would also 
> add expressiveness, scalability and fostering innovation. 
>
> Part of #1 is at least reaching parity with R in terms of data cleaning 
> and manipulation syntax.  Part of R's popularity and its stubborn growth in 
> the face of python's recent maturation is its advantage in ease of 
> expressing data manipulation. If Julia is to compete, at the very least the 
> ecosystem should leverage macros to emulate R's NSE, in a more measured 
> manner (similar to DF meta). 
>
> However I think we should be greedy and think , how can we do better? How 
> can we shorten the overhead and feedback loop in exploring and  
> experimenting with  ideas and the data and models?  I don't have many 
> concrete suggestions here, but I suspect the solution would involve 
> something Dplyr like with conservative and targeted use of interactive  
> javascript and web gl. Can we do transforms on the data with the mouse?  
> Fly through it With 3d glasses? I think we should think kinda wild here.  
> Hadley has discussed his dreams regarding a "grammar of modeling". Is this 
> prob programming or something else? 
>
> What about plotting specifically ? I think an excellent sort of  general  
> exploration framework is Topological Data Analysis. 
>
> Finally, I would look at maximizing diffuse innovation while maintaining  
> uniformity, the strengths of R's and Python's ecosystems respectively. My 
> amateur read of the complex systems science research  is that the ability 
> of a  system to produce new ideas and process information robustly and 
> quickly is correlated with a balance between looseness and diversity on one 
> hand balanced with strength of connection between nodes and some 
> hierarchy.  How can we design the Julia ecosystem to leverage this insight 
> while keeping uniformity in interface?  I'm thinking an abstract interface 
> with generic functions and types (similar to distributions.jl,) that can be 
> easily composed  together by researchers to create new models but can be 
> plugged back in to an API and tooling to be easily leveraged by end users. 
> Further making experimentation easy and fun (a trait that has received much 
> acclaim from researchers already.)  will encourage grad students to pick up 
> Julia and the abstract interface will encourage use of these packages, 
> further increasing incentives to produce. 
>
>  I know this is all very vague, but I just wanted to get my general vision 
> out there. Things like passing in types instead of symbols for choosing 
> methods, using multi inheritance traits to tag new models and solvers, 
> using functions defined on abstract types to get tests and optimizes for 
> free  are some potential specifics.
>
> Specifically regarding a PPL, I would say with recent Lora.jl progress and 
> Distributions.jl, and Julia's much more concise and expressive nature vs 
> C++,  I don't think it would take anywhere near the work of Stan to get 
> something decent. Pymc 3 is pretty darn close and exceeds stan in some 
> areas with much less labor and code volume... and this is just in python. 
> (though also leveraging theano. 
>
> What does everyone think?  
>
> On Sunday, December 27, 2015 at 12:41:18 PM UTC-5, Simon Byrne wrote:
>>
>> Thanks for the suggestions, these are certainly the main areas in which 
>> we're looking to address as part of this work.
>>
>> I'd be interested to hear if you have more thoughts about the model 
>> specification/probabilistic programming language. A few other people have 
>> requested things like this, and this would certainly play to Julia's 
>> strengths (as shown by JuMP.jl). That said, a full-scale probabilistic 
>> programming language might be a bit too much to ask as part of this work 
>> (keep in mind that Stan has been 3+ year project with 2-3 full-time devs + 
>> volunteers), but there might be some low-hanging fruit here we can pick.
>>
>> -simon
>>
>> On Sunday, 27 December 2015 02:32:43 UTC, Lampkld wrote:
>>>
>>> Thanks for the response.
>>>
>>> Since you kindly asked, the following are two main areas in our 
>>> assessment of the general arc of the Julia ecosystem:
>>>
>>> 1. Will the roadmap obviate some of the bottlenecks for day to day 
>>> normal exploratory workflow?  These are minimal  things that R and Python 
>>> have and whose lack hamper any use of Julia for regular analysis. Thing 
>>> like robust dataframe with data i/o into different formats, web scraping, 
>>> work out nullable semantics and integration with ecosystem , robust data 
>>> cleaning and tidy data, modeling with basic  diagnostic tests etc
>>>
>>> 2. Will the roadmap jump leapfrog into areas and capabilities that are 
>>> currently not covered by other stats and data science ecosystems?
>>>
>>>  There are many here, but we are specifically looking at the ability to 
>>> work with modeling on medium sized out of core databases. This would 
>>> include an abstract dataframe like interface to said databses MySQL and 
>>> SQLlite, and some sort of modeling capability on the same. My dream would 
>>> be separation of model specification as a DAG/ probabilistic programming 
>>> framework, from fitting the model. Thus the same model can be fit with 
>>> different sort of data and optimizers. Streaming black box variation 
>>> inference can be a means to extend this to  OOC work. 
>>>
>>> I realize Julia won't for a while have all the statistical tests and 
>>> random models of python, much less R. However, a general yet powerful and 
>>> scalable data querying and prob programming framework could arguably  
>>> suffice for most python and R use cases in Data Science while provide a 
>>> comparative advantage over other frameworks where it counts.  To my 
>>> knowledge, Right now SAS and STATA are the only packages that offer general 
>>> modeling with on disk data sets, but the sort of capability I outlined 
>>> would seem to be in excess of what they offer. 
>>>
>>> A bonus would be filling out gadfly towards Ggplot and ggvis capability. 
>>>  
>>>
>>>
>>> On Thursday, December 24, 2015 at 11:50:42 AM UTC-5, Viral Shah wrote:
>>>>
>>>> What would be helpful is to know what kind of decisions you are 
>>>> thinking of and what are the factors. 
>>>>
>>>> I suspect within 2 weeks for sure - but it's really for the Julia stats 
>>>> folks to say. The idea is to get feedback and chart a course.
>>>>
>>>> -viral
>>>> On 24 Dec 2015 10:07 p.m., "Lampkld" <[email protected]> wrote:
>>>>
>>>>> Sorry to bug you, but can we expect something  this or next week?  
>>>>> Would be helpful in knowing until when to push some stuff off. 
>>>>>
>>>>> On Thursday, December 17, 2015 at 6:20:45 PM UTC-5, Viral Shah wrote:
>>>>>>
>>>>>>
>>>>>> The JuliaStats team will be publishing a general plan on stats+df in 
>>>>>> a few days. I doubt we will have settled on all the df issues by then, 
>>>>>> but 
>>>>>> at least there will be something to start with. 
>>>>>>
>>>>>>
>>>>>> -viral 
>>>>>>
>>>>>>
>>>>>>
>>>>>> > On 17-Dec-2015, at 10:15 PM, Lampkld <[email protected]> wrote: 
>>>>>> > 
>>>>>> > Hi Viral, 
>>>>>> > 
>>>>>> > Any update on this (stats + df) by chance or idea when we can get 
>>>>>> one? Even a roadmap or some sort of vision or other details would help 
>>>>>> with 
>>>>>>   decision making regarding infrastructure. 
>>>>>> > 
>>>>>> > Thanks! 
>>>>>> > 
>>>>>> > On Wednesday, November 11, 2015 at 3:00:50 AM UTC-5, Viral Shah 
>>>>>> wrote: 
>>>>>> > Yes, we are really excited. This grant is to focus on core Julia 
>>>>>> compiler infrastructure and key math libraries. Much of the libraries 
>>>>>> focus 
>>>>>> will be on statistical Computing. 
>>>>>> > -viral 
>>>>>> > 
>>>>>>
>>>>>>

Reply via email to