Re: [julia-users] Dataframe readtable change?

John Myles White Thu, 22 May 2014 13:23:12 -0700

So, do you want us to change the default so that we don't show column 
summaries? For tables without a lot of columns, that may work better; it will 
certainly lower the number of complaints about how we print things.


 -- John

On May 22, 2014, at 1:18 PM, Stefan Karpinski <[email protected]> wrote:

> Solid reasons. I was just voicing my reaction.
> 
> 
> On Thu, May 22, 2014 at 4:16 PM, John Myles White <[email protected]> 
> wrote:
> The original change that summarized large DataFrames was introduced by Julia 
> Evans and brought us closer into sync with pandas. I've been really happy 
> with it.
> 
> Regarding the old way of doing things, I think you should revert to the old 
> display rules for a while and try them again before making up your mind about 
> your preferences. The old display rule was completely illegible for almost 
> every data set that is currently being summarized. And I mean completely 
> illegible, not just ugly.
> 
> One change to formatting that I'd be happy with would be to default to 
> showing the output of show(df, true) for all tables and never showing the 
> column summaries unless explicitly requested. It seems like this default is 
> the thing people most strongly dislike.
> 
> We could remove the ASCII chrome, but I think it's a good idea. MySQL, Hive 
> and Presto all use the same kind of explicit tabular structure when rendering 
> tables. I think making DataFrames behave more like traditional databases is a 
> good thing since it encourages people not to think of them as they were 
> matrices.
> 
> The padding also makes it much easier to copy-and-paste tables since they're 
> valid Markdown tables that any Markdown renderer can easily convert into Tex, 
> HTML, etc.
> 
>  -- John
> 
> On May 22, 2014, at 1:02 PM, Stefan Karpinski <[email protected]> wrote:
> 
>> For what it's worth, I was much happier when dataframes showed their 
>> contents rather than a summary. I must have missed the discussion where that 
>> decision was made (ditto for all the extra ASCII chrome when displaying data 
>> frames these days).
>> 
>> 
>> On Thu, May 22, 2014 at 3:01 PM, John Myles White <[email protected]> 
>> wrote:
>> Nobody had time to integrate it anywhere. A pull request would help move 
>> things forward.
>> 
>>  -- John
>> 
>> On May 22, 2014, at 11:57 AM, Bob Nnamtrop <[email protected]> wrote:
>> 
>>> OK. Thanks. That is helpful.
>>> 
>>> Any reason why that page is not shown in the documentation given in the 
>>> link on the front page.
>>> 
>>> 
>>> On Thu, May 22, 2014 at 11:46 AM, John Myles White 
>>> <[email protected]> wrote:
>>> head and tail don't actually print anything: they just give you a subset of 
>>> a DataFrame. So you're seeing the usual show method's output, which can be 
>>> overriden by explicitly requesting that you see the whole DataFrame. See
>>> 
>>> https://github.com/JuliaStats/DataFrames.jl/blob/master/spec/show.md
>>> 
>>>  -- John
>>> 
>>> On May 22, 2014, at 10:44 AM, Bob Nnamtrop <[email protected]> wrote:
>>> 
>>>>  An issue I noticed with Dataframes recently is that head(df) and tail(df) 
>>>> both list the show(df) summary (like those above) instead of listing the 
>>>> top and bottom of the dataframe. I just started using dataframes so I have 
>>>> no idea what they did in the past but it seems they should list the df and 
>>>> not the summary.
>>>> 
>>>> Also, are there any other handy ways to list the df in the repl?
>>>> 
>>>> Bob
>>>> 
>>>> 
>>>> On Thu, May 22, 2014 at 11:39 AM, Rob J. Goedman <[email protected]> 
>>>> wrote:
>>>> Thanks John.
>>>> 
>>>> I should have filed it as an issue on DataFrames.jl but initially thought 
>>>> it could deeper than that.
>>>> 
>>>> For now in Stan.jl I've included a 'small' cleanup step. Small for say 
>>>> 1000 samples, a bit bigger for 100000 samples.
>>>> 
>>>> Like you mentioned earlier, for years I've been using 
>>>> file-out-file-in-communication for Jags and other programs (Finite 
>>>> Elements) and was quite ok with it because sampling and FE iterations 
>>>> dominated the time to complete.
>>>> 
>>>> FOFI really only became an issue when I had to adjust values in between 
>>>> each of hundreds of runs (e.g. a stiffness matrix in FEM when dealing with 
>>>> buckling).
>>>> 
>>>> Rob J. Goedman
>>>> [email protected]
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On May 22, 2014, at 10:16 AM, John Myles White <[email protected]> 
>>>> wrote:
>>>> 
>>>>> I need to find time to look into this, but could someone try a git bisect 
>>>>> and see if some of the metaprogramming changes we made to readtable 
>>>>> caused this? It might be that this file would have never worked, but if 
>>>>> it once did, it would be good to point out the problematic code.
>>>>> 
>>>>>  — John
>>>>> 
>>>>> On May 20, 2014, at 7:53 PM, Rob J. Goedman <[email protected]> wrote:
>>>>> 
>>>>>> Actually, another way to make it work is removing the blank line. Below 
>>>>>> little program shows that readtable() accepts test_df1 and test_df2, but 
>>>>>> fails on test_df3.
>>>>>> 
>>>>>> Also, the fact that it started to happen today had nothing todo with 
>>>>>> Julia or DataFrame updates. The file is created by Stan and the latest 
>>>>>> version inserts that blank line.
>>>>>> 
>>>>>> Of course I could clean up the file, but maybe this is an issue in 
>>>>>> DataFrame's readtable function?
>>>>>> 
>>>>>> Apologies for the earlier incomplete report.
>>>>>> 
>>>>>> Rob J. Goedman
>>>>>> [email protected]
>>>>>> 
>>>>>> 
>>>>>> <test_df.jl><test_df1.csv>
>>>>>> <test_df2.csv>
>>>>>> <test_df3.csv>
>>>>>> 
>>>>>> 
>>>>>> julia> 
>>>>>> include("/Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl")
>>>>>> 4x10 DataFrame
>>>>>> |-------|---------------|---------|---------|
>>>>>> | Col # | Name          | Eltype  | Missing |
>>>>>> | 1     | lp__          | Float64 | 0       |
>>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>>> | 7     | beta_1        | Float64 | 0       |
>>>>>> | 8     | beta_2        | Float64 | 0       |
>>>>>> | 9     | beta_3        | Float64 | 0       |
>>>>>> | 10    | sigma         | Float64 | 0       |
>>>>>> 
>>>>>> 4x10 DataFrame
>>>>>> |-------|---------------|---------|---------|
>>>>>> | Col # | Name          | Eltype  | Missing |
>>>>>> | 1     | lp__          | Float64 | 0       |
>>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>>> | 7     | beta_1        | Float64 | 0       |
>>>>>> | 8     | beta_2        | Float64 | 0       |
>>>>>> | 9     | beta_3        | Float64 | 0       |
>>>>>> | 10    | sigma         | Float64 | 0       |
>>>>>> 
>>>>>> ERROR: BoundsError()
>>>>>>  in findcorruption at 
>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:663
>>>>>>  in readtable! at 
>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>>>  in readtable at 
>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>>>  in readtable at 
>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>>>  in include at boot.jl:244
>>>>>> while loading 
>>>>>> /Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl, in 
>>>>>> expression starting on line 11
>>>>>> 
>>>>>> julia> 
>>>>>> 
>>>>>> 
>>>>>> On May 20, 2014, at 6:36 PM, Rob J. Goedman <[email protected]> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Using a freshly updated Version 0.3.0-prerelease+3251 (2014-05-20 23:18 
>>>>>>> UTC) of Julia I think I noticed a different behavior of readtable(), 
>>>>>>> which I hope is not intended.
>>>>>>> 
>>>>>>> I have a small test file with data as shown below (and attached as a 
>>>>>>> file at the end of the email):
>>>>>>> 
>>>>>>> lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,n_divergent__,mu
>>>>>>> # Adaptation terminated
>>>>>>> 
>>>>>>> -19.8871,0.975123,0.303529,4,15,0,4.25051
>>>>>>> -22.1208,0.971631,0.303529,3,7,0,8.55276
>>>>>>> -23.8336,0.857954,0.303529,4,15,0,4.41087
>>>>>>> 
>>>>>>> If I remove the commented line ("# Adaptation terminated"), readtable() 
>>>>>>> has no problem, but if it's there readtable() seems to ignore the 
>>>>>>> 'allowcomments=true'.
>>>>>>> 
>>>>>>> I didn't update DataFrames as far as I am aware, but once or twice 
>>>>>>> today I did pull Julia's master from github.
>>>>>>> 
>>>>>>> I wonder if someone could try this example. Thanks a lot.
>>>>>>> 
>>>>>>> Rob J. Goedman
>>>>>>> [email protected]
>>>>>>> 
>>>>>>> 
>>>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>>>>> ERROR: Saw 4 rows, 5 columns and 22 fields
>>>>>>>  * Line 1 has 3 columns
>>>>>>> 
>>>>>>>  in error at error.jl:21
>>>>>>>  in findcorruption at 
>>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:680
>>>>>>>  in readtable! at 
>>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>>>>  in readtable at 
>>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>>>>  in readtable at 
>>>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>>>> 
>>>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>>>>> 3x7 DataFrame
>>>>>>> |-------|---------------|---------|---------|
>>>>>>> | Col # | Name          | Eltype  | Missing |
>>>>>>> | 1     | lp__          | Float64 | 0       |
>>>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>>>> | 7     | mu            | Float64 | 0       |
>>>>>>> 
>>>>>>> 
>>>>>>> <schools8_samples.csv>
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: [julia-users] Dataframe readtable change?

Reply via email to