John, if you haven't already, you might want to read up a bit on column 
oriented databases. It seems like these map more closely to dataframes in 
their current form, and it'd be good to understand what's been done with 
them, and what is/isn't possible before deciding to move to a row-oriented 
approach for dataframes.

See e.g. 
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems 
which unfortunately is not the greatest article. I believe that one of the 
proprietary APL derivatives (K?) that's used in finance has a highly 
regarded integrated column oriented database.

Unfortunately, I've already said more than I know about the topic. But I've 
heard claims that column oriented databases have at least some interesting 
advantages over row oriented systems.

On Sunday, September 7, 2014 10:32:53 AM UTC-7, John Myles White wrote:
>
> FWIW, I think it’s much easier to index structures if every row has an 
> atomic existence that is independent of the table it is currently part of. 
> (This is a big part of my interest in moving away from matrix semantics and 
> towards relational model semantics.)
>
> It’s a little harder to index DataFrames because the row indices change 
> over time, so your index can’t just map values to indices. (Well, it can: 
> but then it needs to be updated very frequently: potentially the entire 
> index has to be rewritten if you delete the first row of a DataFrame.)
>
>  — John
>
> On Sep 7, 2014, at 10:27 AM, Harlan Harris <[email protected] 
> <javascript:>> wrote:
>
> This was a feature that sorta existed for a while (see 
> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was 
> very happy with it, and I think John ripped it out as part of one of his 
> simplification passes. It's tricky to think about how best to implement 
> this sort of feature when you aspirationally want to support memory-mapped 
> and distributed structures too, and where you want a semantics that's 
> explicitly set-like, cf Pandas or R's data.tables. 
>
> Also worth thinking about this in the context of John's just-announced 
> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e
>
>
>
> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected] 
> <javascript:>> wrote:
>
>> No, DataFrames are not indexed. For now, you’d need to build a wrapper 
>> that indexes a DataFrame to get that kind of functionality.
>>
>>  — John
>>
>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected] 
>> <javascript:>> wrote:
>>
>> > Hi,
>> > I was wondering if searching in a dataframe is indexed (in the DB 
>> sense, not array sense. e.g. a tree index structure) or not? If so can you 
>> have multiple indices (on multiple columns) or not?
>>
>>
>
>

Reply via email to