Re: [julia-users] Are dataframes indexed?

Steven Sagaert Sun, 07 Sep 2014 16:51:06 -0700


On Monday, September 8, 2014 1:37:50 AM UTC+2, John Myles White wrote:
>
> Well, you can write an interface to sqlite to generate in-memory DB's. The 
> only restriction is that you won't get some of the semantics you might want 
> relative to DataFrames, which allow entries of all types.
>
> Personally, I'm much less interested in Spark and SciDB and much more 
> interested in Hive.
>


Well Spark has a Hive clone called Shark which can run HiveQL queries. It's 
just a lot faster ;) But they are moving more towards  a general SQL system 
called Spark SQL and will reimplement Shark also on top of that in the 
future.
 

> Blaze's approach is very interesting.
>
I agree. 

>
>  -- John
>
> On Sep 7, 2014, at 4:32 PM, Steven Sagaert <steven....@gmail.com 
> <javascript:>> wrote:
>
>
>
> On Sunday, September 7, 2014 7:28:18 PM UTC+2, Harlan Harris wrote:
>>
>> This was a feature that sorta existed for a while (see 
>> https://github.com/JuliaStats/DataFrames.jl/issues/24 ), but nobody was 
>> very happy with it, and I think John ripped it out as part of one of his 
>> simplification passes. It's tricky to think about how best to implement 
>> this sort of feature when you aspirationally want to support memory-mapped 
>> and distributed structures too,
>>
> I was more thinking along the lines of a simple in-memory db. If you want 
> out-of-memory & distributed it's probably best to interface systems like 
> Spark SQL or Scidb rather than develop that yourselves from scratch. Maybe 
> write something in the spirit of Blaze (blaze.pydata.org)? Right now 
> Blaze supports Spark but I was just discussing with them about scidb and 
> they are also looking into that.
>  
>
>> and where you want a semantics that's explicitly set-like, cf Pandas or 
>> R's data.tables. 
>>
> R's data.table is nice but unfortunately only supports just one index. 
>
>>
>> Also worth thinking about this in the context of John's just-announced 
>> goals: https://gist.github.com/johnmyleswhite/ad5305ecaa9de01e317e
>>
>>
>>
>> On Sun, Sep 7, 2014 at 12:54 PM, John Myles White <johnmyl...@gmail.com> 
>> wrote:
>>
>>> No, DataFrames are not indexed. For now, you’d need to build a wrapper 
>>> that indexes a DataFrame to get that kind of functionality.
>>>
>>>  — John
>>>
>>> On Sep 7, 2014, at 9:53 AM, Steven Sagaert <steven....@gmail.com> wrote:
>>>
>>> > Hi,
>>> > I was wondering if searching in a dataframe is indexed (in the DB 
>>> sense, not array sense. e.g. a tree index structure) or not? If so can you 
>>> have multiple indices (on multiple columns) or not?
>>>
>>>
>>
>

Re: [julia-users] Are dataframes indexed?

Reply via email to