Doing a convert(OrderedDict, DataFrameRow) seems like it's going to be a much worse performance hit than copying everything into a specific OrderedDict that's reused, because you're going to allocate memory for a new OrderedDict object on every iteration.
-- John On Sep 12, 2014, at 2:44 PM, Gray Calhoun <gcalh...@iastate.edu> wrote: > Probably not in most, you're right. > > Can't you get generic code as long as a method to convert to OrderedDict is > supplied, though? > > When you don't need anything more specific, convert the dataframe row to an > OrderedDict, then either work with that object or convert it into a more > appropriate internal format. But if you want to write specific algorithms for > different storage types, that's still an option (e.g. either work with > immutable DBI rows, or use a custom convert method to a more appropriate > format, skipping the OrderedDict intermediate step). > > On Friday, September 12, 2014 3:26:47 PM UTC-5, John Myles White wrote: > I'm not sure that losing zero copy semantics is actually a big performance > hit in most pipelines. > > I think much more important is that you can't write generic code right now > because the abstractions aren't linked in any way. The rows you fetch from a > database using DBI aren't mutable, whereas the rows you fecth using > eachrow(df) are. > > -- John > > On Sep 12, 2014, at 1:08 PM, Gray Calhoun <gcal...@iastate.edu> wrote: > >> It seems like standardizing on "convert" would be a natural approach when >> one needs to go from one to the other. I don't know the DBI semantics, but >> >> myrow = convert(Dict, mydataframerow) >> myrow2 = convert(OrderedDict, mydataframerow), >> >> etc is transparent and lets different data storage objects use efficient >> representations internally (losing "zero copy semantics" is a huge >> sacrifice.) >> >> It's also easier to enforce in future packages: much simpler to add convert >> methods than to re-represent rows as OrderedDicts (or whatever datatype). >> >> On Friday, September 12, 2014 12:19:47 PM UTC-5, John Myles White wrote: >> We really need to standardize on a single type that reflects a single row of >> a tabular data structure that gets used both by DBI and by DataFrames. >> >> DataFrameRow is really nice because it's a zero-copy operation for >> DataFrames, but we can't provide zero-copy semantics when pulling rows out >> of a database. >> >> I tend to think we should have all tabular data systems use an OrderedDict >> to represent a single row of data. >> [...] >