John, many thanks!

Cheers,
Kaj


On Sunday, October 26, 2014 1:27:03 AM UTC+3, John Myles White wrote:
>
> FWIW, I don’t think overhead is the right concept here: DataFrames and 
> Arrays are almost almost totally dissimilar data structures. (DataFrames 
> are arguably much more like Dict’s than Array’s.)
>
> If Arrays are appropriate, use those. DataFrames are designed for use in 
> cases where Arrays are clearly not a meaningful data structure to apply to 
> your problems because Arrays don’t maintain any of the invariants that a 
> DataFrame must maintain — column homogeneity coupled with row heterogeneity.
>
> DataArrays are also totally dissimilar from DataFrames — in fact, they’re 
> exactly like Arrays with the option of storing a singleton value called NA. 
> Right now they have a non-trivial performance overhead relative to Arrays, 
> but that will decrease over time. What won’t decrease over time is the 
> cognitive overhead of using DataArrays — they impose a lot more complexity 
> because of the uncertainty about what you’ll get from a DataArray when you 
> index in it. That complexity's only appropriate if you’re working with 
> missing values. In many applications there are no missing values, so 
> DataArrays are needless complexity.
>
> Here’s how I think of doing data analysis:
>
> (1) Gather data
> (2) Store a data in a tabular data structure
> (3) Apply transformations (like those found in GLM) to transform a tabular 
> data structure into an Matrix{Float64}
> (4) Do numerical computations on Matrix{Float64}
>
>  — John
>
> On Oct 25, 2014, at 3:19 PM, Kaj Wiik <[email protected] <javascript:>> 
> wrote:
>
> A followup from a fellow astronomer: what is the overhead of data frames 
> compared to plain arrays, are there any benchmarks available? When I should 
> avoid of using data arrays or should I use them always :-)?
>
> Cheers,
> Kaj
>
> On Saturday, October 25, 2014 3:37:18 PM UTC+3, Daniel Carrera wrote:
>>
>> Hello,
>>
>> This is a fairly naive question. I have observed for the last two years 
>> that many people really like data frames. R users obviously like them, and 
>> the Python and Julia communities thought it was worth adding that feature 
>> to their languages too. However, as an astronomer, I have not yet had a 
>> problem that would be solved by data frames. I use Julia to analyze 
>> hydrodynamic simulations. I can imagine that data frames could have a role 
>> in photographic data where some pixels are missing.
>>
>> Are you a scientist or engineer currently using data frames to solve a 
>> problem? I would love to hear about what you do with data frames and why 
>> you find them useful.
>>
>> Cheers,
>> Daniel.
>>
>
>

Reply via email to