On Monday, September 8, 2014 1:37:50 AM UTC+2, John Myles White wrote: > > Well, you can write an interface to sqlite to generate in-memory DB's. The > only restriction is that you won't get some of the semantics you might want > relative to DataFrames, which allow entries of all types. > > Personally, I'm much less interested in Spark and SciDB and much more > interested in Hive. >
Well Spark has a Hive clone called Shark which can run HiveQL queries. It's just a lot faster ;) But they are moving more towards a general SQL system called Spark SQL and will reimplement Shark also on top of that in the future. > Blaze's approach is very interesting. > I agree. > > -- John > > On Sep 7, 2014, at 4:32 PM, Steven Sagaert <steven....@gmail.com > <javascript:>> wrote: > > > > On Sunday, September 7, 2014 7:28:18 PM UTC+2, Harlan Harris wrote: >> >> This was a feature that sorta existed for a while (see >> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was >> very happy with it, and I think John ripped it out as part of one of his >> simplification passes. It's tricky to think about how best to implement >> this sort of feature when you aspirationally want to support memory-mapped >> and distributed structures too, >> > I was more thinking along the lines of a simple in-memory db. If you want > out-of-memory & distributed it's probably best to interface systems like > Spark SQL or Scidb rather than develop that yourselves from scratch. Maybe > write something in the spirit of Blaze (blaze.pydata.org)? Right now > Blaze supports Spark but I was just discussing with them about scidb and > they are also looking into that. > > >> and where you want a semantics that's explicitly set-like, cf Pandas or >> R's data.tables. >> > R's data.table is nice but unfortunately only supports just one index. > >> >> Also worth thinking about this in the context of John's just-announced >> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e >> >> >> >> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <johnmyl...@gmail.com> >> wrote: >> >>> No, DataFrames are not indexed. For now, you’d need to build a wrapper >>> that indexes a DataFrame to get that kind of functionality. >>> >>> — John >>> >>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <steven....@gmail.com> wrote: >>> >>> > Hi, >>> > I was wondering if searching in a dataframe is indexed (in the DB >>> sense, not array sense. e.g. a tree index structure) or not? If so can you >>> have multiple indices (on multiple columns) or not? >>> >>> >> >