Re: [julia-users] Are dataframes indexed?

Steven Sagaert Sun, 07 Sep 2014 16:39:30 -0700


On Sunday, September 7, 2014 7:53:15 PM UTC+2, Jason Merrill wrote:
>
> John, if you haven't already, you might want to read up a bit on column 
> oriented databases. It seems like these map more closely to dataframes in 
> their current form, and it'd be good to understand what's been done with 
> them, and what is/isn't possible before deciding to move to a row-oriented 
> approach for dataframes.
>
> See e.g. 
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems 
> which unfortunately is not the greatest article. I believe that one of the 
> proprietary APL derivatives (K?) that's used in finance has a highly 
> regarded integrated column oriented database.
>
> Unfortunately, I've already said more than I know about the topic. But 
> I've heard claims that column oriented databases have at least some 
> interesting advantages over row oriented systems.
>
The advantage is that they can calculate aggregates much more quickly than 
row storage RDBs. So they are useful for data 
warehouses/analytics/scientific stuff. That's why R's dataframe is also 
column oriented I guess. Vertica is one such system. Monet db 
(www.*monetdb*.com) 
 is another open source one.


>
> On Sunday, September 7, 2014 10:32:53 AM UTC-7, John Myles White wrote:
>>
>> FWIW, I think it’s much easier to index structures if every row has an 
>> atomic existence that is independent of the table it is currently part of. 
>> (This is a big part of my interest in moving away from matrix semantics and 
>> towards relational model semantics.)
>>
>> It’s a little harder to index DataFrames because the row indices change 
>> over time, so your index can’t just map values to indices. (Well, it can: 
>> but then it needs to be updated very frequently: potentially the entire 
>> index has to be rewritten if you delete the first row of a DataFrame.)
>>
>>  — John
>>
>> On Sep 7, 2014, at 10:27 AM, Harlan Harris <[email protected]> wrote:
>>
>> This was a feature that sorta existed for a while (see 
>> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was 
>> very happy with it, and I think John ripped it out as part of one of his 
>> simplification passes. It's tricky to think about how best to implement 
>> this sort of feature when you aspirationally want to support memory-mapped 
>> and distributed structures too, and where you want a semantics that's 
>> explicitly set-like, cf Pandas or R's data.tables. 
>>
>> Also worth thinking about this in the context of John's just-announced 
>> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e
>>
>>
>>
>> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> 
>> wrote:
>>
>>> No, DataFrames are not indexed. For now, you’d need to build a wrapper 
>>> that indexes a DataFrame to get that kind of functionality.
>>>
>>>  — John
>>>
>>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote:
>>>
>>> > Hi,
>>> > I was wondering if searching in a dataframe is indexed (in the DB 
>>> sense, not array sense. e.g. a tree index structure) or not? If so can you 
>>> have multiple indices (on multiple columns) or not?
>>>
>>>
>>
>>

Re: [julia-users] Are dataframes indexed?

Reply via email to