On Sunday, September 7, 2014 7:53:15 PM UTC+2, Jason Merrill wrote: > > John, if you haven't already, you might want to read up a bit on column > oriented databases. It seems like these map more closely to dataframes in > their current form, and it'd be good to understand what's been done with > them, and what is/isn't possible before deciding to move to a row-oriented > approach for dataframes. > > See e.g. > https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems > which unfortunately is not the greatest article. I believe that one of the > proprietary APL derivatives (K?) that's used in finance has a highly > regarded integrated column oriented database. > > Unfortunately, I've already said more than I know about the topic. But > I've heard claims that column oriented databases have at least some > interesting advantages over row oriented systems. > The advantage is that they can calculate aggregates much more quickly than row storage RDBs. So they are useful for data warehouses/analytics/scientific stuff. That's why R's dataframe is also column oriented I guess. Vertica is one such system. Monet db (www.*monetdb*.com) is another open source one.
> > On Sunday, September 7, 2014 10:32:53 AM UTC-7, John Myles White wrote: >> >> FWIW, I think it’s much easier to index structures if every row has an >> atomic existence that is independent of the table it is currently part of. >> (This is a big part of my interest in moving away from matrix semantics and >> towards relational model semantics.) >> >> It’s a little harder to index DataFrames because the row indices change >> over time, so your index can’t just map values to indices. (Well, it can: >> but then it needs to be updated very frequently: potentially the entire >> index has to be rewritten if you delete the first row of a DataFrame.) >> >> — John >> >> On Sep 7, 2014, at 10:27 AM, Harlan Harris <[email protected]> wrote: >> >> This was a feature that sorta existed for a while (see >> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was >> very happy with it, and I think John ripped it out as part of one of his >> simplification passes. It's tricky to think about how best to implement >> this sort of feature when you aspirationally want to support memory-mapped >> and distributed structures too, and where you want a semantics that's >> explicitly set-like, cf Pandas or R's data.tables. >> >> Also worth thinking about this in the context of John's just-announced >> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e >> >> >> >> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> >> wrote: >> >>> No, DataFrames are not indexed. For now, you’d need to build a wrapper >>> that indexes a DataFrame to get that kind of functionality. >>> >>> — John >>> >>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote: >>> >>> > Hi, >>> > I was wondering if searching in a dataframe is indexed (in the DB >>> sense, not array sense. e.g. a tree index structure) or not? If so can you >>> have multiple indices (on multiple columns) or not? >>> >>> >> >>
