Re: [julia-users] Dataframe readtable change?

Stefan Karpinski Thu, 22 May 2014 14:02:45 -0700

I'm not sure that I have an opinion about that, honestly. If you think it
will reduce complaints, then that's certainly something.



On Thu, May 22, 2014 at 4:22 PM, John Myles White
<[email protected]>wrote:

> So, do you want us to change the default so that we don't show column
> summaries? For tables without a lot of columns, that may work better; it
> will certainly lower the number of complaints about how we print things.
>
>  -- John
>
> On May 22, 2014, at 1:18 PM, Stefan Karpinski <[email protected]>
> wrote:
>
> Solid reasons. I was just voicing my reaction.
>
>
> On Thu, May 22, 2014 at 4:16 PM, John Myles White <
> [email protected]> wrote:
>
>> The original change that summarized large DataFrames was introduced by
>> Julia Evans and brought us closer into sync with pandas. I've been really
>> happy with it.
>>
>> Regarding the old way of doing things, I think you should revert to the
>> old display rules for a while and try them again before making up your mind
>> about your preferences. The old display rule was completely illegible for
>> almost every data set that is currently being summarized. And I mean
>> completely illegible, not just ugly.
>>
>> One change to formatting that I'd be happy with would be to default to
>> showing the output of show(df, true) for all tables and never showing the
>> column summaries unless explicitly requested. It seems like this default is
>> the thing people most strongly dislike.
>>
>> We could remove the ASCII chrome, but I think it's a good idea. MySQL,
>> Hive and Presto all use the same kind of explicit tabular structure when
>> rendering tables. I think making DataFrames behave more like traditional
>> databases is a good thing since it encourages people not to think of them
>> as they were matrices.
>>
>> The padding also makes it much easier to copy-and-paste tables since
>> they're valid Markdown tables that any Markdown renderer can easily convert
>> into Tex, HTML, etc.
>>
>>  -- John
>>
>> On May 22, 2014, at 1:02 PM, Stefan Karpinski <[email protected]>
>> wrote:
>>
>> For what it's worth, I was much happier when dataframes showed their
>> contents rather than a summary. I must have missed the discussion where
>> that decision was made (ditto for all the extra ASCII chrome when
>> displaying data frames these days).
>>
>>
>> On Thu, May 22, 2014 at 3:01 PM, John Myles White <
>> [email protected]> wrote:
>>
>>> Nobody had time to integrate it anywhere. A pull request would help move
>>> things forward.
>>>
>>>  -- John
>>>
>>> On May 22, 2014, at 11:57 AM, Bob Nnamtrop <[email protected]>
>>> wrote:
>>>
>>> OK. Thanks. That is helpful.
>>>
>>> Any reason why that page is not shown in the documentation given in the
>>> link on the front page.
>>>
>>>
>>> On Thu, May 22, 2014 at 11:46 AM, John Myles White <
>>> [email protected]> wrote:
>>>
>>>> head and tail don't actually print anything: they just give you a
>>>> subset of a DataFrame. So you're seeing the usual show method's output,
>>>> which can be overriden by explicitly requesting that you see the whole
>>>> DataFrame. See
>>>>
>>>> https://github.com/JuliaStats/DataFrames.jl/blob/master/spec/show.md
>>>>
>>>>  -- John
>>>>
>>>> On May 22, 2014, at 10:44 AM, Bob Nnamtrop <[email protected]>
>>>> wrote:
>>>>
>>>>  An issue I noticed with Dataframes recently is that head(df) and
>>>> tail(df) both list the show(df) summary (like those above) instead of
>>>> listing the top and bottom of the dataframe. I just started using
>>>> dataframes so I have no idea what they did in the past but it seems they
>>>> should list the df and not the summary.
>>>>
>>>> Also, are there any other handy ways to list the df in the repl?
>>>>
>>>> Bob
>>>>
>>>>
>>>> On Thu, May 22, 2014 at 11:39 AM, Rob J. Goedman <[email protected]>wrote:
>>>>
>>>>> Thanks John.
>>>>>
>>>>> I should have filed it as an issue on DataFrames.jl but initially
>>>>> thought it could deeper than that.
>>>>>
>>>>> For now in Stan.jl I've included a 'small' cleanup step. Small for say
>>>>> 1000 samples, a bit bigger for 100000 samples.
>>>>>
>>>>> Like you mentioned earlier, for years I've been using
>>>>> file-out-file-in-communication for Jags and other programs (Finite
>>>>> Elements) and was quite ok with it because sampling and FE iterations
>>>>> dominated the time to complete.
>>>>>
>>>>> FOFI really only became an issue when I had to adjust values in
>>>>> between each of hundreds of runs (e.g. a stiffness matrix in FEM when
>>>>> dealing with buckling).
>>>>>
>>>>>  Rob J. Goedman
>>>>> [email protected]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On May 22, 2014, at 10:16 AM, John Myles White <
>>>>> [email protected]> wrote:
>>>>>
>>>>> I need to find time to look into this, but could someone try a git
>>>>> bisect and see if some of the metaprogramming changes we made to readtable
>>>>> caused this? It might be that this file would have never worked, but if it
>>>>> once did, it would be good to point out the problematic code.
>>>>>
>>>>>  — John
>>>>>
>>>>> On May 20, 2014, at 7:53 PM, Rob J. Goedman <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Actually, another way to make it work is removing the blank line.
>>>>> Below little program shows that readtable() accepts test_df1 and test_df2,
>>>>> but fails on test_df3.
>>>>>
>>>>> Also, the fact that it started to happen today had nothing todo with
>>>>> Julia or DataFrame updates. The file is created by Stan and the latest
>>>>> version inserts that blank line.
>>>>>
>>>>> Of course I could clean up the file, but maybe this is an issue in
>>>>> DataFrame's readtable function?
>>>>>
>>>>> Apologies for the earlier incomplete report.
>>>>>
>>>>>  Rob J. Goedman
>>>>> [email protected]
>>>>>
>>>>>
>>>>>  <test_df.jl><test_df1.csv>
>>>>> <test_df2.csv>
>>>>> <test_df3.csv>
>>>>>
>>>>>
>>>>> julia>
>>>>> include("/Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl")
>>>>> 4x10 DataFrame
>>>>> |-------|---------------|---------|---------|
>>>>> | Col # | Name          | Eltype  | Missing |
>>>>> | 1     | lp__          | Float64 | 0       |
>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>> | 7     | beta_1        | Float64 | 0       |
>>>>> | 8     | beta_2        | Float64 | 0       |
>>>>> | 9     | beta_3        | Float64 | 0       |
>>>>> | 10    | sigma         | Float64 | 0       |
>>>>>
>>>>> 4x10 DataFrame
>>>>> |-------|---------------|---------|---------|
>>>>> | Col # | Name          | Eltype  | Missing |
>>>>> | 1     | lp__          | Float64 | 0       |
>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>> | 7     | beta_1        | Float64 | 0       |
>>>>> | 8     | beta_2        | Float64 | 0       |
>>>>> | 9     | beta_3        | Float64 | 0       |
>>>>> | 10    | sigma         | Float64 | 0       |
>>>>>
>>>>> ERROR: BoundsError()
>>>>>  in findcorruption at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:663
>>>>>  in readtable! at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>>  in readtable at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>>  in readtable at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>>  in include at boot.jl:244
>>>>> while loading
>>>>> /Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl, in 
>>>>> expression
>>>>> starting on line 11
>>>>>
>>>>> julia>
>>>>>
>>>>>
>>>>> On May 20, 2014, at 6:36 PM, Rob J. Goedman <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Using a freshly updated Version 0.3.0-prerelease+3251 (2014-05-20
>>>>> 23:18 UTC) of Julia I think I noticed a different behavior of
>>>>> readtable(), which I hope is not intended.
>>>>>
>>>>> I have a small test file with data as shown below (and attached as a
>>>>> file at the end of the email):
>>>>>
>>>>> lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,n_divergent__,mu
>>>>> # Adaptation terminated
>>>>>
>>>>> -19.8871,0.975123,0.303529,4,15,0,4.25051
>>>>> -22.1208,0.971631,0.303529,3,7,0,8.55276
>>>>> -23.8336,0.857954,0.303529,4,15,0,4.41087
>>>>>
>>>>> If I remove the commented line ("# Adaptation terminated"),
>>>>> readtable() has no problem, but if it's there readtable() seems to ignore
>>>>> the 'allowcomments=true'.
>>>>>
>>>>> I didn't update DataFrames as far as I am aware, but once or twice
>>>>> today I did pull Julia's master from github.
>>>>>
>>>>> I wonder if someone could try this example. Thanks a lot.
>>>>>
>>>>> Rob J. Goedman
>>>>> [email protected]
>>>>>
>>>>>
>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>>> ERROR: Saw 4 rows, 5 columns and 22 fields
>>>>>  * Line 1 has 3 columns
>>>>>
>>>>>  in error at error.jl:21
>>>>>  in findcorruption at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:680
>>>>>  in readtable! at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>>  in readtable at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>>  in readtable at
>>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>>
>>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>>> 3x7 DataFrame
>>>>> |-------|---------------|---------|---------|
>>>>> | Col # | Name          | Eltype  | Missing |
>>>>> | 1     | lp__          | Float64 | 0       |
>>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>>> | 3     | stepsize__    | Float64 | 0       |
>>>>> | 4     | treedepth__   | Int64   | 0       |
>>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>>> | 7     | mu            | Float64 | 0       |
>>>>>
>>>>>
>>>>> <schools8_samples.csv>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: [julia-users] Dataframe readtable change?

Reply via email to