I kind of suspect my team (which is the team that invented Hive) isn't likely 
to stop using Hive anytime soon.

 -- John

On Sep 7, 2014, at 4:50 PM, Steven Sagaert <[email protected]> wrote:

> 
> 
> On Monday, September 8, 2014 1:37:50 AM UTC+2, John Myles White wrote:
> Well, you can write an interface to sqlite to generate in-memory DB's. The 
> only restriction is that you won't get some of the semantics you might want 
> relative to DataFrames, which allow entries of all types.
> 
> Personally, I'm much less interested in Spark and SciDB and much more 
> interested in Hive.
> 
> Well Spark has a Hive clone called Shark which can run HiveQL queries. It's 
> just a lot faster ;) But they are moving more towards  a general SQL system 
> called Spark SQL and will reimplement Shark also on top of that in the future.
>  
> Blaze's approach is very interesting.
> I agree. 
> 
>  -- John
> 
> On Sep 7, 2014, at 4:32 PM, Steven Sagaert <[email protected]> wrote:
> 
>> 
>> 
>> On Sunday, September 7, 2014 7:28:18 PM UTC+2, Harlan Harris wrote:
>> This was a feature that sorta existed for a while (see 
>> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was very 
>> happy with it, and I think John ripped it out as part of one of his 
>> simplification passes. It's tricky to think about how best to implement this 
>> sort of feature when you aspirationally want to support memory-mapped and 
>> distributed structures too,
>> I was more thinking along the lines of a simple in-memory db. If you want 
>> out-of-memory & distributed it's probably best to interface systems like 
>> Spark SQL or Scidb rather than develop that yourselves from scratch. Maybe 
>> write something in the spirit of Blaze (blaze.pydata.org)? Right now Blaze 
>> supports Spark but I was just discussing with them about scidb and they are 
>> also looking into that.
>>  
>> and where you want a semantics that's explicitly set-like, cf Pandas or R's 
>> data.tables. 
>> R's data.table is nice but unfortunately only supports just one index. 
>> 
>> Also worth thinking about this in the context of John's just-announced 
>> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e
>> 
>> 
>> 
>> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <[email protected]> 
>> wrote:
>> No, DataFrames are not indexed. For now, you’d need to build a wrapper that 
>> indexes a DataFrame to get that kind of functionality.
>> 
>>  — John
>> 
>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <[email protected]> wrote:
>> 
>> > Hi,
>> > I was wondering if searching in a dataframe is indexed (in the DB sense, 
>> > not array sense. e.g. a tree index structure) or not? If so can you have 
>> > multiple indices (on multiple columns) or not?
>> 
>> 
> 

Reply via email to