Doing a convert(OrderedDict, DataFrameRow) seems like it's going to be a much 
worse performance hit than copying everything into a specific OrderedDict 
that's reused, because you're going to allocate memory for a new OrderedDict 
object on every iteration.

 -- John

On Sep 12, 2014, at 2:44 PM, Gray Calhoun <gcalh...@iastate.edu> wrote:

> Probably not in most, you're right.
> 
> Can't you get generic code as long as a method to convert to OrderedDict is 
> supplied, though?
> 
> When you don't need anything more specific, convert the dataframe row to an 
> OrderedDict, then either work with that object or convert it into a more 
> appropriate internal format. But if you want to write specific algorithms for 
> different storage types, that's still an option (e.g. either work with 
> immutable DBI rows, or use a custom convert method to a more appropriate 
> format, skipping the OrderedDict intermediate step).
> 
> On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote:
> I'm not sure that losing zero copy semantics is actually a big performance 
> hit in most pipelines.
> 
> I think much more important is that you can't write generic code right now 
> because the abstractions aren't linked in any way. The rows you fetch from a 
> database using DBI aren't mutable, whereas the rows you fecth using 
> eachrow(df) are.
> 
>  -- John
> 
> On Sep 12, 2014, at 1:08 PM, Gray Calhoun <gcal...@iastate.edu> wrote:
> 
>> It seems like standardizing on "convert" would be a natural approach when 
>> one needs to go from one to the other. I don't know the DBI semantics, but
>> 
>>   myrow = convert(Dict, mydataframerow)
>>   myrow2 = convert(OrderedDict, mydataframerow), 
>> 
>> etc is transparent and lets different data storage objects use efficient 
>> representations internally (losing "zero copy semantics" is a huge 
>> sacrifice.)
>> 
>> It's also easier to enforce in future packages: much simpler to add convert 
>> methods than to re-represent rows as OrderedDicts (or whatever datatype).
>> 
>> On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote:
>> We really need to standardize on a single type that reflects a single row of 
>> a tabular data structure that gets used both by DBI and by DataFrames.
>> 
>> DataFrameRow is really nice because it's a zero-copy operation for 
>> DataFrames, but we can't provide zero-copy semantics when pulling rows out 
>> of a database.
>> 
>> I tend to think we should have all tabular data systems use an OrderedDict 
>> to represent a single row of data.
>> [...] 
> 

Reply via email to