Re: [julia-users] Dataframe readtable change?

Stefan Karpinski Thu, 22 May 2014 13:20:05 -0700

Solid reasons. I was just voicing my reaction.


On Thu, May 22, 2014 at 4:16 PM, John Myles White
<[email protected]>wrote:

> The original change that summarized large DataFrames was introduced by
> Julia Evans and brought us closer into sync with pandas. I've been really
> happy with it.
>
> Regarding the old way of doing things, I think you should revert to the
> old display rules for a while and try them again before making up your mind
> about your preferences. The old display rule was completely illegible for
> almost every data set that is currently being summarized. And I mean
> completely illegible, not just ugly.
>
> One change to formatting that I'd be happy with would be to default to
> showing the output of show(df, true) for all tables and never showing the
> column summaries unless explicitly requested. It seems like this default is
> the thing people most strongly dislike.
>
> We could remove the ASCII chrome, but I think it's a good idea. MySQL,
> Hive and Presto all use the same kind of explicit tabular structure when
> rendering tables. I think making DataFrames behave more like traditional
> databases is a good thing since it encourages people not to think of them
> as they were matrices.
>
> The padding also makes it much easier to copy-and-paste tables since
> they're valid Markdown tables that any Markdown renderer can easily convert
> into Tex, HTML, etc.
>
>  -- John
>
> On May 22, 2014, at 1:02 PM, Stefan Karpinski <[email protected]>
> wrote:
>
> For what it's worth, I was much happier when dataframes showed their
> contents rather than a summary. I must have missed the discussion where
> that decision was made (ditto for all the extra ASCII chrome when
> displaying data frames these days).
>
>
> On Thu, May 22, 2014 at 3:01 PM, John Myles White <
> [email protected]> wrote:
>
>> Nobody had time to integrate it anywhere. A pull request would help move
>> things forward.
>>
>>  -- John
>>
>> On May 22, 2014, at 11:57 AM, Bob Nnamtrop <[email protected]>
>> wrote:
>>
>> OK. Thanks. That is helpful.
>>
>> Any reason why that page is not shown in the documentation given in the
>> link on the front page.
>>
>>
>> On Thu, May 22, 2014 at 11:46 AM, John Myles White <
>> [email protected]> wrote:
>>
>>> head and tail don't actually print anything: they just give you a subset
>>> of a DataFrame. So you're seeing the usual show method's output, which can
>>> be overriden by explicitly requesting that you see the whole DataFrame. See
>>>
>>> https://github.com/JuliaStats/DataFrames.jl/blob/master/spec/show.md
>>>
>>>  -- John
>>>
>>> On May 22, 2014, at 10:44 AM, Bob Nnamtrop <[email protected]>
>>> wrote:
>>>
>>>  An issue I noticed with Dataframes recently is that head(df) and
>>> tail(df) both list the show(df) summary (like those above) instead of
>>> listing the top and bottom of the dataframe. I just started using
>>> dataframes so I have no idea what they did in the past but it seems they
>>> should list the df and not the summary.
>>>
>>> Also, are there any other handy ways to list the df in the repl?
>>>
>>> Bob
>>>
>>>
>>> On Thu, May 22, 2014 at 11:39 AM, Rob J. Goedman <[email protected]>wrote:
>>>
>>>> Thanks John.
>>>>
>>>> I should have filed it as an issue on DataFrames.jl but initially
>>>> thought it could deeper than that.
>>>>
>>>> For now in Stan.jl I've included a 'small' cleanup step. Small for say
>>>> 1000 samples, a bit bigger for 100000 samples.
>>>>
>>>> Like you mentioned earlier, for years I've been using
>>>> file-out-file-in-communication for Jags and other programs (Finite
>>>> Elements) and was quite ok with it because sampling and FE iterations
>>>> dominated the time to complete.
>>>>
>>>> FOFI really only became an issue when I had to adjust values in between
>>>> each of hundreds of runs (e.g. a stiffness matrix in FEM when dealing with
>>>> buckling).
>>>>
>>>>  Rob J. Goedman
>>>> [email protected]
>>>>
>>>>
>>>>
>>>>
>>>> On May 22, 2014, at 10:16 AM, John Myles White <
>>>> [email protected]> wrote:
>>>>
>>>> I need to find time to look into this, but could someone try a git
>>>> bisect and see if some of the metaprogramming changes we made to readtable
>>>> caused this? It might be that this file would have never worked, but if it
>>>> once did, it would be good to point out the problematic code.
>>>>
>>>>  — John
>>>>
>>>> On May 20, 2014, at 7:53 PM, Rob J. Goedman <[email protected]> wrote:
>>>>
>>>> Actually, another way to make it work is removing the blank line. Below
>>>> little program shows that readtable() accepts test_df1 and test_df2, but
>>>> fails on test_df3.
>>>>
>>>> Also, the fact that it started to happen today had nothing todo with
>>>> Julia or DataFrame updates. The file is created by Stan and the latest
>>>> version inserts that blank line.
>>>>
>>>> Of course I could clean up the file, but maybe this is an issue in
>>>> DataFrame's readtable function?
>>>>
>>>> Apologies for the earlier incomplete report.
>>>>
>>>>  Rob J. Goedman
>>>> [email protected]
>>>>
>>>>
>>>>  <test_df.jl><test_df1.csv>
>>>> <test_df2.csv>
>>>> <test_df3.csv>
>>>>
>>>>
>>>> julia>
>>>> include("/Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl")
>>>> 4x10 DataFrame
>>>> |-------|---------------|---------|---------|
>>>> | Col # | Name          | Eltype  | Missing |
>>>> | 1     | lp__          | Float64 | 0       |
>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>> | 3     | stepsize__    | Float64 | 0       |
>>>> | 4     | treedepth__   | Int64   | 0       |
>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>> | 7     | beta_1        | Float64 | 0       |
>>>> | 8     | beta_2        | Float64 | 0       |
>>>> | 9     | beta_3        | Float64 | 0       |
>>>> | 10    | sigma         | Float64 | 0       |
>>>>
>>>> 4x10 DataFrame
>>>> |-------|---------------|---------|---------|
>>>> | Col # | Name          | Eltype  | Missing |
>>>> | 1     | lp__          | Float64 | 0       |
>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>> | 3     | stepsize__    | Float64 | 0       |
>>>> | 4     | treedepth__   | Int64   | 0       |
>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>> | 7     | beta_1        | Float64 | 0       |
>>>> | 8     | beta_2        | Float64 | 0       |
>>>> | 9     | beta_3        | Float64 | 0       |
>>>> | 10    | sigma         | Float64 | 0       |
>>>>
>>>> ERROR: BoundsError()
>>>>  in findcorruption at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:663
>>>>  in readtable! at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>  in readtable at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>  in readtable at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>  in include at boot.jl:244
>>>> while loading
>>>> /Users/rob/.julia/v0.3/MCMCExampleRepository/test/test_df.jl, in expression
>>>> starting on line 11
>>>>
>>>> julia>
>>>>
>>>>
>>>> On May 20, 2014, at 6:36 PM, Rob J. Goedman <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Using a freshly updated Version 0.3.0-prerelease+3251 (2014-05-20
>>>> 23:18 UTC) of Julia I think I noticed a different behavior of
>>>> readtable(), which I hope is not intended.
>>>>
>>>> I have a small test file with data as shown below (and attached as a
>>>> file at the end of the email):
>>>>
>>>> lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,n_divergent__,mu
>>>> # Adaptation terminated
>>>>
>>>> -19.8871,0.975123,0.303529,4,15,0,4.25051
>>>> -22.1208,0.971631,0.303529,3,7,0,8.55276
>>>> -23.8336,0.857954,0.303529,4,15,0,4.41087
>>>>
>>>> If I remove the commented line ("# Adaptation terminated"), readtable()
>>>> has no problem, but if it's there readtable() seems to ignore the
>>>> 'allowcomments=true'.
>>>>
>>>> I didn't update DataFrames as far as I am aware, but once or twice
>>>> today I did pull Julia's master from github.
>>>>
>>>> I wonder if someone could try this example. Thanks a lot.
>>>>
>>>> Rob J. Goedman
>>>> [email protected]
>>>>
>>>>
>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>> ERROR: Saw 4 rows, 5 columns and 22 fields
>>>>  * Line 1 has 3 columns
>>>>
>>>>  in error at error.jl:21
>>>>  in findcorruption at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:680
>>>>  in readtable! at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:731
>>>>  in readtable at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:812
>>>>  in readtable at
>>>> /Users/rob/.julia/v0.3/DataFrames/src/dataframe/io.jl:879
>>>>
>>>> julia> df = readtable("schools8_samples.csv", allowcomments=true)
>>>> 3x7 DataFrame
>>>> |-------|---------------|---------|---------|
>>>> | Col # | Name          | Eltype  | Missing |
>>>> | 1     | lp__          | Float64 | 0       |
>>>> | 2     | accept_stat__ | Float64 | 0       |
>>>> | 3     | stepsize__    | Float64 | 0       |
>>>> | 4     | treedepth__   | Int64   | 0       |
>>>> | 5     | n_leapfrog__  | Int64   | 0       |
>>>> | 6     | n_divergent__ | Int64   | 0       |
>>>> | 7     | mu            | Float64 | 0       |
>>>>
>>>>
>>>> <schools8_samples.csv>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: [julia-users] Dataframe readtable change?

Reply via email to