I've thought a few times about reimplementing Stan in Julia. I wonder how much of Stan's codebase is about parsing/code-generation (which would be drastically simpler in Julia) versus fine-tuning their NUTS sampler. And how much of that work about automatic differentiation/code generation could be shared with the deep-learning libraries.
On Sunday, December 27, 2015 at 1:28:53 PM UTC-5, Lampkld wrote: > > Viral and Symon, > > Since you asked, I will write out some rough and probably excessively > abstract Ideas that have been floating around in my head below. I don't > have time to formally polish, so please forgive the inchoate nature of > these thoughts: > > Yes, composability and generality are the names of the game! I would also > add expressiveness, scalability and fostering innovation. > > Part of #1 is at least reaching parity with R in terms of data cleaning > and manipulation syntax. Part of R's popularity and its stubborn growth in > the face of python's recent maturation is its advantage in ease of > expressing data manipulation. If Julia is to compete, at the very least the > ecosystem should leverage macros to emulate R's NSE, in a more measured > manner (similar to DF meta). > > However I think we should be greedy and think , how can we do better? How > can we shorten the overhead and feedback loop in exploring and > experimenting with ideas and the data and models? I don't have many > concrete suggestions here, but I suspect the solution would involve > something Dplyr like with conservative and targeted use of interactive > javascript and web gl. Can we do transforms on the data with the mouse? > Fly through it With 3d glasses? I think we should think kinda wild here. > Hadley has discussed his dreams regarding a "grammar of modeling". Is this > prob programming or something else? > > What about plotting specifically ? I think an excellent sort of general > exploration framework is Topological Data Analysis. > > Finally, I would look at maximizing diffuse innovation while maintaining > uniformity, the strengths of R's and Python's ecosystems respectively. My > amateur read of the complex systems science research is that the ability > of a system to produce new ideas and process information robustly and > quickly is correlated with a balance between looseness and diversity on one > hand balanced with strength of connection between nodes and some > hierarchy. How can we design the Julia ecosystem to leverage this insight > while keeping uniformity in interface? I'm thinking an abstract interface > with generic functions and types (similar to distributions.jl,) that can be > easily composed together by researchers to create new models but can be > plugged back in to an API and tooling to be easily leveraged by end users. > Further making experimentation easy and fun (a trait that has received much > acclaim from researchers already.) will encourage grad students to pick up > Julia and the abstract interface will encourage use of these packages, > further increasing incentives to produce. > > I know this is all very vague, but I just wanted to get my general vision > out there. Things like passing in types instead of symbols for choosing > methods, using multi inheritance traits to tag new models and solvers, > using functions defined on abstract types to get tests and optimizes for > free are some potential specifics. > > Specifically regarding a PPL, I would say with recent Lora.jl progress and > Distributions.jl, and Julia's much more concise and expressive nature vs > C++, I don't think it would take anywhere near the work of Stan to get > something decent. Pymc 3 is pretty darn close and exceeds stan in some > areas with much less labor and code volume... and this is just in python. > (though also leveraging theano. > > What does everyone think? > > On Sunday, December 27, 2015 at 12:41:18 PM UTC-5, Simon Byrne wrote: >> >> Thanks for the suggestions, these are certainly the main areas in which >> we're looking to address as part of this work. >> >> I'd be interested to hear if you have more thoughts about the model >> specification/probabilistic programming language. A few other people have >> requested things like this, and this would certainly play to Julia's >> strengths (as shown by JuMP.jl). That said, a full-scale probabilistic >> programming language might be a bit too much to ask as part of this work >> (keep in mind that Stan has been 3+ year project with 2-3 full-time devs + >> volunteers), but there might be some low-hanging fruit here we can pick. >> >> -simon >> >> On Sunday, 27 December 2015 02:32:43 UTC, Lampkld wrote: >>> >>> Thanks for the response. >>> >>> Since you kindly asked, the following are two main areas in our >>> assessment of the general arc of the Julia ecosystem: >>> >>> 1. Will the roadmap obviate some of the bottlenecks for day to day >>> normal exploratory workflow? These are minimal things that R and Python >>> have and whose lack hamper any use of Julia for regular analysis. Thing >>> like robust dataframe with data i/o into different formats, web scraping, >>> work out nullable semantics and integration with ecosystem , robust data >>> cleaning and tidy data, modeling with basic diagnostic tests etc >>> >>> 2. Will the roadmap jump leapfrog into areas and capabilities that are >>> currently not covered by other stats and data science ecosystems? >>> >>> There are many here, but we are specifically looking at the ability to >>> work with modeling on medium sized out of core databases. This would >>> include an abstract dataframe like interface to said databses MySQL and >>> SQLlite, and some sort of modeling capability on the same. My dream would >>> be separation of model specification as a DAG/ probabilistic programming >>> framework, from fitting the model. Thus the same model can be fit with >>> different sort of data and optimizers. Streaming black box variation >>> inference can be a means to extend this to OOC work. >>> >>> I realize Julia won't for a while have all the statistical tests and >>> random models of python, much less R. However, a general yet powerful and >>> scalable data querying and prob programming framework could arguably >>> suffice for most python and R use cases in Data Science while provide a >>> comparative advantage over other frameworks where it counts. To my >>> knowledge, Right now SAS and STATA are the only packages that offer general >>> modeling with on disk data sets, but the sort of capability I outlined >>> would seem to be in excess of what they offer. >>> >>> A bonus would be filling out gadfly towards Ggplot and ggvis capability. >>> >>> >>> >>> On Thursday, December 24, 2015 at 11:50:42 AM UTC-5, Viral Shah wrote: >>>> >>>> What would be helpful is to know what kind of decisions you are >>>> thinking of and what are the factors. >>>> >>>> I suspect within 2 weeks for sure - but it's really for the Julia stats >>>> folks to say. The idea is to get feedback and chart a course. >>>> >>>> -viral >>>> On 24 Dec 2015 10:07 p.m., "Lampkld" <[email protected]> wrote: >>>> >>>>> Sorry to bug you, but can we expect something this or next week? >>>>> Would be helpful in knowing until when to push some stuff off. >>>>> >>>>> On Thursday, December 17, 2015 at 6:20:45 PM UTC-5, Viral Shah wrote: >>>>>> >>>>>> >>>>>> The JuliaStats team will be publishing a general plan on stats+df in >>>>>> a few days. I doubt we will have settled on all the df issues by then, >>>>>> but >>>>>> at least there will be something to start with. >>>>>> >>>>>> >>>>>> -viral >>>>>> >>>>>> >>>>>> >>>>>> > On 17-Dec-2015, at 10:15 PM, Lampkld <[email protected]> wrote: >>>>>> > >>>>>> > Hi Viral, >>>>>> > >>>>>> > Any update on this (stats + df) by chance or idea when we can get >>>>>> one? Even a roadmap or some sort of vision or other details would help >>>>>> with >>>>>> decision making regarding infrastructure. >>>>>> > >>>>>> > Thanks! >>>>>> > >>>>>> > On Wednesday, November 11, 2015 at 3:00:50 AM UTC-5, Viral Shah >>>>>> wrote: >>>>>> > Yes, we are really excited. This grant is to focus on core Julia >>>>>> compiler infrastructure and key math libraries. Much of the libraries >>>>>> focus >>>>>> will be on statistical Computing. >>>>>> > -viral >>>>>> > >>>>>> >>>>>>
