So, do you want us to change the default so that we don't show column summaries? For tables without a lot of columns, that may work better; it will certainly lower the number of complaints about how we print things.
-- John On May 22, 2014, at 1:18 PM, Stefan Karpinski <[email protected]> wrote: > Solid reasons. I was just voicing my reaction. > > > On Thu, May 22, 2014 at 4:16 PM, John Myles White <[email protected]> > wrote: > The original change that summarized large DataFrames was introduced by Julia > Evans and brought us closer into sync with pandas. I've been really happy > with it. > > Regarding the old way of doing things, I think you should revert to the old > display rules for a while and try them again before making up your mind about > your preferences. The old display rule was completely illegible for almost > every data set that is currently being summarized. And I mean completely > illegible, not just ugly. > > One change to formatting that I'd be happy with would be to default to > showing the output of show(df, true) for all tables and never showing the > column summaries unless explicitly requested. It seems like this default is > the thing people most strongly dislike. > > We could remove the ASCII chrome, but I think it's a good idea. MySQL, Hive > and Presto all use the same kind of explicit tabular structure when rendering > tables. I think making DataFrames behave more like traditional databases is a > good thing since it encourages people not to think of them as they were > matrices. > > The padding also makes it much easier to copy-and-paste tables since they're > valid Markdown tables that any Markdown renderer can easily convert into Tex, > HTML, etc. > > -- John > > On May 22, 2014, at 1:02 PM, Stefan Karpinski <[email protected]> wrote: > >> For what it's worth, I was much happier when dataframes showed their >> contents rather than a summary. I must have missed the discussion where that >> decision was made (ditto for all the extra ASCII chrome when displaying data >> frames these days). >> >> >> On Thu, May 22, 2014 at 3:01 PM, John Myles White <[email protected]> >> wrote: >> Nobody had time to integrate it anywhere. A pull request would help move >> things forward. >> >> -- John >> >> On May 22, 2014, at 11:57 AM, Bob Nnamtrop <[email protected]> wrote: >> >>> OK. Thanks. That is helpful. >>> >>> Any reason why that page is not shown in the documentation given in the >>> link on the front page. >>> >>> >>> On Thu, May 22, 2014 at 11:46 AM, John Myles White >>> <[email protected]> wrote: >>> head and tail don't actually print anything: they just give you a subset of >>> a DataFrame. So you're seeing the usual show method's output, which can be >>> overriden by explicitly requesting that you see the whole DataFrame. See >>> >>> https://github.com/JuliaStats/DataFrames.jl/blob/master/spec/show.md >>> >>> -- John >>> >>> On May 22, 2014, at 10:44 AM, Bob Nnamtrop <[email protected]> wrote: >>> >>>> An issue I noticed with Dataframes recently is that head(df) and tail(df) >>>> both list the show(df) summary (like those above) instead of listing the >>>> top and bottom of the dataframe. I just started using dataframes so I have >>>> no idea what they did in the past but it seems they should list the df and >>>> not the summary. >>>> >>>> Also, are there any other handy ways to list the df in the repl? >>>> >>>> Bob >>>> >>>> >>>> On Thu, May 22, 2014 at 11:39 AM, Rob J. Goedman <[email protected]> >>>> wrote: >>>> Thanks John. >>>> >>>> I should have filed it as an issue on DataFrames.jl but initially thought >>>> it could deeper than that. >>>> >>>> For now in Stan.jl I've included a 'small' cleanup step. Small for say >>>> 1000 samples, a bit bigger for 100000 samples. >>>> >>>> Like you mentioned earlier, for years I've been using >>>> file-out-file-in-communication for Jags and other programs (Finite >>>> Elements) and was quite ok with it because sampling and FE iterations >>>> dominated the time to complete. >>>> >>>> FOFI really only became an issue when I had to adjust values in between >>>> each of hundreds of runs (e.g. a stiffness matrix in FEM when dealing with >>>> buckling). >>>> >>>> Rob J. Goedman >>>> [email protected] >>>> >>>> >>>> >>>> >>>> On May 22, 2014, at 10:16 AM, John Myles White <[email protected]> >>>> wrote: >>>> >>>>> I need to find time to look into this, but could someone try a git bisect >>>>> and see if some of the metaprogramming changes we made to readtable >>>>> caused this? It might be that this file would have never worked, but if >>>>> it once did, it would be good to point out the problematic code. >>>>> >>>>> — John >>>>> >>>>> On May 20, 2014, at 7:53 PM, Rob J. Goedman <[email protected]> wrote: >>>>> >>>>>> Actually, another way to make it work is removing the blank line. Below >>>>>> little program shows that readtable() accepts test_df1 and test_df2, but >>>>>> fails on test_df3. >>>>>> >>>>>> Also, the fact that it started to happen today had nothing todo with >>>>>> Julia or DataFrame updates. The file is created by Stan and the latest >>>>>> version inserts that blank line. >>>>>> >>>>>> Of course I could clean up the file, but maybe this is an issue in >>>>>> DataFrame's readtable function? >>>>>> >>>>>> Apologies for the earlier incomplete report. >>>>>> >>>>>> Rob J. Goedman >>>>>> [email protected] >>>>>> >>>>>> >>>>>> <test_df.jl><test_df1.csv> >>>>>> <test_df2.csv> >>>>>> <test_df3.csv> >>>>>> >>>>>> >>>>>> julia> >>>>>> include("/Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl") >>>>>> 4x10 DataFrame >>>>>> |-------|---------------|---------|---------| >>>>>> | Col # | Name | Eltype | Missing | >>>>>> | 1 | lp__ | Float64 | 0 | >>>>>> | 2 | accept_stat__ | Float64 | 0 | >>>>>> | 3 | stepsize__ | Float64 | 0 | >>>>>> | 4 | treedepth__ | Int64 | 0 | >>>>>> | 5 | n_leapfrog__ | Int64 | 0 | >>>>>> | 6 | n_divergent__ | Int64 | 0 | >>>>>> | 7 | beta_1 | Float64 | 0 | >>>>>> | 8 | beta_2 | Float64 | 0 | >>>>>> | 9 | beta_3 | Float64 | 0 | >>>>>> | 10 | sigma | Float64 | 0 | >>>>>> >>>>>> 4x10 DataFrame >>>>>> |-------|---------------|---------|---------| >>>>>> | Col # | Name | Eltype | Missing | >>>>>> | 1 | lp__ | Float64 | 0 | >>>>>> | 2 | accept_stat__ | Float64 | 0 | >>>>>> | 3 | stepsize__ | Float64 | 0 | >>>>>> | 4 | treedepth__ | Int64 | 0 | >>>>>> | 5 | n_leapfrog__ | Int64 | 0 | >>>>>> | 6 | n_divergent__ | Int64 | 0 | >>>>>> | 7 | beta_1 | Float64 | 0 | >>>>>> | 8 | beta_2 | Float64 | 0 | >>>>>> | 9 | beta_3 | Float64 | 0 | >>>>>> | 10 | sigma | Float64 | 0 | >>>>>> >>>>>> ERROR: BoundsError() >>>>>> in findcorruption at >>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:663 >>>>>> in readtable! at >>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731 >>>>>> in readtable at >>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812 >>>>>> in readtable at >>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879 >>>>>> in include at boot.jl:244 >>>>>> while loading >>>>>> /Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl, in >>>>>> expression starting on line 11 >>>>>> >>>>>> julia> >>>>>> >>>>>> >>>>>> On May 20, 2014, at 6:36 PM, Rob J. Goedman <[email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Using a freshly updated Version 0.3.0-prerelease+3251 (2014-05-20 23:18 >>>>>>> UTC) of Julia I think I noticed a different behavior of readtable(), >>>>>>> which I hope is not intended. >>>>>>> >>>>>>> I have a small test file with data as shown below (and attached as a >>>>>>> file at the end of the email): >>>>>>> >>>>>>> lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,n_divergent__,mu >>>>>>> # Adaptation terminated >>>>>>> >>>>>>> -19.8871,0.975123,0.303529,4,15,0,4.25051 >>>>>>> -22.1208,0.971631,0.303529,3,7,0,8.55276 >>>>>>> -23.8336,0.857954,0.303529,4,15,0,4.41087 >>>>>>> >>>>>>> If I remove the commented line ("# Adaptation terminated"), readtable() >>>>>>> has no problem, but if it's there readtable() seems to ignore the >>>>>>> 'allowcomments=true'. >>>>>>> >>>>>>> I didn't update DataFrames as far as I am aware, but once or twice >>>>>>> today I did pull Julia's master from github. >>>>>>> >>>>>>> I wonder if someone could try this example. Thanks a lot. >>>>>>> >>>>>>> Rob J. Goedman >>>>>>> [email protected] >>>>>>> >>>>>>> >>>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true) >>>>>>> ERROR: Saw 4 rows, 5 columns and 22 fields >>>>>>> * Line 1 has 3 columns >>>>>>> >>>>>>> in error at error.jl:21 >>>>>>> in findcorruption at >>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:680 >>>>>>> in readtable! at >>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731 >>>>>>> in readtable at >>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812 >>>>>>> in readtable at >>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879 >>>>>>> >>>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true) >>>>>>> 3x7 DataFrame >>>>>>> |-------|---------------|---------|---------| >>>>>>> | Col # | Name | Eltype | Missing | >>>>>>> | 1 | lp__ | Float64 | 0 | >>>>>>> | 2 | accept_stat__ | Float64 | 0 | >>>>>>> | 3 | stepsize__ | Float64 | 0 | >>>>>>> | 4 | treedepth__ | Int64 | 0 | >>>>>>> | 5 | n_leapfrog__ | Int64 | 0 | >>>>>>> | 6 | n_divergent__ | Int64 | 0 | >>>>>>> | 7 | mu | Float64 | 0 | >>>>>>> >>>>>>> >>>>>>> <schools8_samples.csv> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >> >> > >
