On Sunday, September 7, 2014 7:28:18 PM UTC+2, Harlan Harris wrote: > > This was a feature that sorta existed for a while (see > https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was > very happy with it, and I think John ripped it out as part of one of his > simplification passes. It's tricky to think about how best to implement > this sort of feature when you aspirationally want to support memory-mapped > and distributed structures too, > I was more thinking along the lines of a simple in-memory db. If you want out-of-memory & distributed it's probably best to interface systems like Spark SQL or Scidb rather than develop that yourselves from scratch. Maybe write something in the spirit of Blaze (blaze.pydata.org)? Right now Blaze supports Spark but I was just discussing with them about scidb and they are also looking into that.
> and where you want a semantics that's explicitly set-like, cf Pandas or > R's data.tables. > R's data.table is nice but unfortunately only supports just one index. > > Also worth thinking about this in the context of John's just-announced > goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e > > > > On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected] > <javascript:>> wrote: > >> No, DataFrames are not indexed. For now, you’d need to build a wrapper >> that indexes a DataFrame to get that kind of functionality. >> >> — John >> >> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected] >> <javascript:>> wrote: >> >> > Hi, >> > I was wondering if searching in a dataframe is indexed (in the DB >> sense, not array sense. e.g. a tree index structure) or not? If so can you >> have multiple indices (on multiple columns) or not? >> >> >
