Well, at least we have learned that people are looking for a good IDE for
Julia :)
On Friday, September 18, 2015 at 3:45:41 PM UTC-4, Daniel Carrera wrote:
>
> Hello,
>
> Just for fun, does anyone want to help me model the distribution of posts
> per thread in the julia-users list?
>
> The attached file (posts-per-thread.tsv) contains the number of posts and
> views (1st and 2nd columns) for the last 160 threads in julia-users. I
> tried to fit a Poisson distribution using maximum likelihood, but the true
> distribution appears to have a fat tail. My script is below.
>
> using DataFrames
> using PyPlot
>
> df = readtable("posts-per-thread.tsv")
>
> max_posts = maximum(df[:posts])
>
> #
> # Fit a Poisson distribution using maximum likelihood.
> #
> lambda = mean(df[:posts])
>
>
> #
> # Count the number of threads with 'N' posts.
> #
> posts = Int[]
> threads = Int[]
> poisson_ys = Float64[]
> poisson_xs = [1:max_posts]
>
> n_factorial = 1
>
> for nposts in 1:max_posts
> nthreads = sum(df[:posts] .== nposts)
>
> if nthreads > 0
> push!(posts, nposts)
> push!(threads, nthreads)
> end
>
> #
> # Poisson
> #
> n_factorial *= nposts
> push!(poisson_ys, lambda^nposts * exp(-nposts) / n_factorial)
> end
>
> #
> # Plot the number of threads vs thread length.
> #
> figure(1)
> xlabel("Number of posts")
> ylabel("Frequency")
>
> ylim(0,25)
> xlim(0,45) # Remove the outlier thread.
>
> #
> # The "6" is an arbitrary scaling factor to make the plots line up better.
> #
> plot(posts, threads, "bo")
> plot(poisson_xs[1:40], 6 * poisson_ys[1:40], "r-")
>
>
> Cheers,
> Daniel.
>