I kind of suspect my team (which is the team that invented Hive) isn't likely to stop using Hive anytime soon.
-- John On Sep 7, 2014, at 4:50 PM, Steven Sagaert <[email protected]> wrote: > > > On Monday, September 8, 2014 1:37:50 AM UTC+2, John Myles White wrote: > Well, you can write an interface to sqlite to generate in-memory DB's. The > only restriction is that you won't get some of the semantics you might want > relative to DataFrames, which allow entries of all types. > > Personally, I'm much less interested in Spark and SciDB and much more > interested in Hive. > > Well Spark has a Hive clone called Shark which can run HiveQL queries. It's > just a lot faster ;) But they are moving more towards a general SQL system > called Spark SQL and will reimplement Shark also on top of that in the future. > > Blaze's approach is very interesting. > I agree. > > -- John > > On Sep 7, 2014, at 4:32 PM, Steven Sagaert <[email protected]> wrote: > >> >> >> On Sunday, September 7, 2014 7:28:18 PM UTC+2, Harlan Harris wrote: >> This was a feature that sorta existed for a while (see >> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was very >> happy with it, and I think John ripped it out as part of one of his >> simplification passes. It's tricky to think about how best to implement this >> sort of feature when you aspirationally want to support memory-mapped and >> distributed structures too, >> I was more thinking along the lines of a simple in-memory db. If you want >> out-of-memory & distributed it's probably best to interface systems like >> Spark SQL or Scidb rather than develop that yourselves from scratch. Maybe >> write something in the spirit of Blaze (blaze.pydata.org)? Right now Blaze >> supports Spark but I was just discussing with them about scidb and they are >> also looking into that. >> >> and where you want a semantics that's explicitly set-like, cf Pandas or R's >> data.tables. >> R's data.table is nice but unfortunately only supports just one index. >> >> Also worth thinking about this in the context of John's just-announced >> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e >> >> >> >> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> >> wrote: >> No, DataFrames are not indexed. For now, you’d need to build a wrapper that >> indexes a DataFrame to get that kind of functionality. >> >> — John >> >> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote: >> >> > Hi, >> > I was wondering if searching in a dataframe is indexed (in the DB sense, >> > not array sense. e.g. a tree index structure) or not? If so can you have >> > multiple indices (on multiple columns) or not? >> >> >
