FWIW, I don’t think overhead is the right concept here: DataFrames and Arrays 
are almost almost totally dissimilar data structures. (DataFrames are arguably 
much more like Dict’s than Array’s.)

If Arrays are appropriate, use those. DataFrames are designed for use in cases 
where Arrays are clearly not a meaningful data structure to apply to your 
problems because Arrays don’t maintain any of the invariants that a DataFrame 
must maintain — column homogeneity coupled with row heterogeneity.

DataArrays are also totally dissimilar from DataFrames — in fact, they’re 
exactly like Arrays with the option of storing a singleton value called NA. 
Right now they have a non-trivial performance overhead relative to Arrays, but 
that will decrease over time. What won’t decrease over time is the cognitive 
overhead of using DataArrays — they impose a lot more complexity because of the 
uncertainty about what you’ll get from a DataArray when you index in it. That 
complexity's only appropriate if you’re working with missing values. In many 
applications there are no missing values, so DataArrays are needless complexity.

Here’s how I think of doing data analysis:

(1) Gather data
(2) Store a data in a tabular data structure
(3) Apply transformations (like those found in GLM) to transform a tabular data 
structure into an Matrix{Float64}
(4) Do numerical computations on Matrix{Float64}

 — John

On Oct 25, 2014, at 3:19 PM, Kaj Wiik <[email protected]> wrote:

> A followup from a fellow astronomer: what is the overhead of data frames 
> compared to plain arrays, are there any benchmarks available? When I should 
> avoid of using data arrays or should I use them always :-)?
> 
> Cheers,
> Kaj
> 
> On Saturday, October 25, 2014 3:37:18 PM UTC+3, Daniel Carrera wrote:
> Hello,
> 
> This is a fairly naive question. I have observed for the last two years that 
> many people really like data frames. R users obviously like them, and the 
> Python and Julia communities thought it was worth adding that feature to 
> their languages too. However, as an astronomer, I have not yet had a problem 
> that would be solved by data frames. I use Julia to analyze hydrodynamic 
> simulations. I can imagine that data frames could have a role in photographic 
> data where some pixels are missing.
> 
> Are you a scientist or engineer currently using data frames to solve a 
> problem? I would love to hear about what you do with data frames and why you 
> find them useful.
> 
> Cheers,
> Daniel.

Reply via email to