I've read up a little bit on column-oriented DB’s and will be doing a bunch 
more as I start thinking about how to design DataTables.

The trick is that right now DataFrames aren’t databases at all in the 
relational model sense, because they impose a bunch of structure that a DB 
doesn’t have to provide.

 — John

On Sep 7, 2014, at 10:53 AM, Jason Merrill <[email protected]> wrote:

> John, if you haven't already, you might want to read up a bit on column 
> oriented databases. It seems like these map more closely to dataframes in 
> their current form, and it'd be good to understand what's been done with 
> them, and what is/isn't possible before deciding to move to a row-oriented 
> approach for dataframes.
> 
> See e.g. 
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems which 
> unfortunately is not the greatest article. I believe that one of the 
> proprietary APL derivatives (K?) that's used in finance has a highly regarded 
> integrated column oriented database.
> 
> Unfortunately, I've already said more than I know about the topic. But I've 
> heard claims that column oriented databases have at least some interesting 
> advantages over row oriented systems.
> 
> On Sunday, September 7, 2014 10:32:53 AM UTC-7, John Myles White wrote:
> FWIW, I think it’s much easier to index structures if every row has an atomic 
> existence that is independent of the table it is currently part of. (This is 
> a big part of my interest in moving away from matrix semantics and towards 
> relational model semantics.)
> 
> It’s a little harder to index DataFrames because the row indices change over 
> time, so your index can’t just map values to indices. (Well, it can: but then 
> it needs to be updated very frequently: potentially the entire index has to 
> be rewritten if you delete the first row of a DataFrame.)
> 
>  — John
> 
> On Sep 7, 2014, at 10:27 AM, Harlan Harris <[email protected]> wrote:
> 
>> This was a feature that sorta existed for a while (see 
>> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was very 
>> happy with it, and I think John ripped it out as part of one of his 
>> simplification passes. It's tricky to think about how best to implement this 
>> sort of feature when you aspirationally want to support memory-mapped and 
>> distributed structures too, and where you want a semantics that's explicitly 
>> set-like, cf Pandas or R's data.tables. 
>> 
>> Also worth thinking about this in the context of John's just-announced 
>> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e
>> 
>> 
>> 
>> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> 
>> wrote:
>> No, DataFrames are not indexed. For now, you’d need to build a wrapper that 
>> indexes a DataFrame to get that kind of functionality.
>> 
>>  — John
>> 
>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote:
>> 
>> > Hi,
>> > I was wondering if searching in a dataframe is indexed (in the DB sense, 
>> > not array sense. e.g. a tree index structure) or not? If so can you have 
>> > multiple indices (on multiple columns) or not?
>> 
>> 
> 

Reply via email to