I've read up a little bit on column-oriented DB’s and will be doing a bunch more as I start thinking about how to design DataTables.
The trick is that right now DataFrames aren’t databases at all in the relational model sense, because they impose a bunch of structure that a DB doesn’t have to provide. — John On Sep 7, 2014, at 10:53 AM, Jason Merrill <[email protected]> wrote: > John, if you haven't already, you might want to read up a bit on column > oriented databases. It seems like these map more closely to dataframes in > their current form, and it'd be good to understand what's been done with > them, and what is/isn't possible before deciding to move to a row-oriented > approach for dataframes. > > See e.g. > https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems which > unfortunately is not the greatest article. I believe that one of the > proprietary APL derivatives (K?) that's used in finance has a highly regarded > integrated column oriented database. > > Unfortunately, I've already said more than I know about the topic. But I've > heard claims that column oriented databases have at least some interesting > advantages over row oriented systems. > > On Sunday, September 7, 2014 10:32:53 AM UTC-7, John Myles White wrote: > FWIW, I think it’s much easier to index structures if every row has an atomic > existence that is independent of the table it is currently part of. (This is > a big part of my interest in moving away from matrix semantics and towards > relational model semantics.) > > It’s a little harder to index DataFrames because the row indices change over > time, so your index can’t just map values to indices. (Well, it can: but then > it needs to be updated very frequently: potentially the entire index has to > be rewritten if you delete the first row of a DataFrame.) > > — John > > On Sep 7, 2014, at 10:27 AM, Harlan Harris <[email protected]> wrote: > >> This was a feature that sorta existed for a while (see >> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was very >> happy with it, and I think John ripped it out as part of one of his >> simplification passes. It's tricky to think about how best to implement this >> sort of feature when you aspirationally want to support memory-mapped and >> distributed structures too, and where you want a semantics that's explicitly >> set-like, cf Pandas or R's data.tables. >> >> Also worth thinking about this in the context of John's just-announced >> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e >> >> >> >> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> >> wrote: >> No, DataFrames are not indexed. For now, you’d need to build a wrapper that >> indexes a DataFrame to get that kind of functionality. >> >> — John >> >> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote: >> >> > Hi, >> > I was wondering if searching in a dataframe is indexed (in the DB sense, >> > not array sense. e.g. a tree index structure) or not? If so can you have >> > multiple indices (on multiple columns) or not? >> >> >
