John, many thanks!
Cheers,
Kaj
On Sunday, October 26, 2014 1:27:03 AM UTC+3, John Myles White wrote:
>
> FWIW, I don’t think overhead is the right concept here: DataFrames and
> Arrays are almost almost totally dissimilar data structures. (DataFrames
> are arguably much more like Dict’s than Array’s.)
>
> If Arrays are appropriate, use those. DataFrames are designed for use in
> cases where Arrays are clearly not a meaningful data structure to apply to
> your problems because Arrays don’t maintain any of the invariants that a
> DataFrame must maintain — column homogeneity coupled with row heterogeneity.
>
> DataArrays are also totally dissimilar from DataFrames — in fact, they’re
> exactly like Arrays with the option of storing a singleton value called NA.
> Right now they have a non-trivial performance overhead relative to Arrays,
> but that will decrease over time. What won’t decrease over time is the
> cognitive overhead of using DataArrays — they impose a lot more complexity
> because of the uncertainty about what you’ll get from a DataArray when you
> index in it. That complexity's only appropriate if you’re working with
> missing values. In many applications there are no missing values, so
> DataArrays are needless complexity.
>
> Here’s how I think of doing data analysis:
>
> (1) Gather data
> (2) Store a data in a tabular data structure
> (3) Apply transformations (like those found in GLM) to transform a tabular
> data structure into an Matrix{Float64}
> (4) Do numerical computations on Matrix{Float64}
>
> — John
>
> On Oct 25, 2014, at 3:19 PM, Kaj Wiik <[email protected] <javascript:>>
> wrote:
>
> A followup from a fellow astronomer: what is the overhead of data frames
> compared to plain arrays, are there any benchmarks available? When I should
> avoid of using data arrays or should I use them always :-)?
>
> Cheers,
> Kaj
>
> On Saturday, October 25, 2014 3:37:18 PM UTC+3, Daniel Carrera wrote:
>>
>> Hello,
>>
>> This is a fairly naive question. I have observed for the last two years
>> that many people really like data frames. R users obviously like them, and
>> the Python and Julia communities thought it was worth adding that feature
>> to their languages too. However, as an astronomer, I have not yet had a
>> problem that would be solved by data frames. I use Julia to analyze
>> hydrodynamic simulations. I can imagine that data frames could have a role
>> in photographic data where some pixels are missing.
>>
>> Are you a scientist or engineer currently using data frames to solve a
>> problem? I would love to hear about what you do with data frames and why
>> you find them useful.
>>
>> Cheers,
>> Daniel.
>>
>
>