Re: [naviserver-devel] commit/nsdbi: 3 new changesets

Stephen Sun, 02 Feb 2014 17:40:24 -0800

On Sun, Feb 2, 2014 at 9:16 PM, Gustaf Neumann <neum...@wu.ac.at> wrote:
>
> Am 02.02.14 19:36, schrieb Stephen:
> >
> > The original concern was that a generic utility function which wraps
> > dbi_rows and uses the typical foreach pattern might have it's private
> > variables clobbered by column names it can't predict. Returning a list
> > of dicts from dbi_rows is one way around that.
> >
> > Looks like there's a couple of different situations where wrappers are
> > wanted:
> >
> > 1) Legacy ACS db_* functions
> >
> > Looks like a lot of that code unpacks row ns_sets into local vars
> > anyway. db_multirow even has an -unclobber switch which restores the
> > original values of any vars which clash with column names.
> >
> > It seems like a bit of a backwards step to add legacy sets to the dbi
> > api only to have the legacy db_* stuff unpack it again.
>
> The best approach for OpenACS would be to rewrite the db-interface based
> on dbi_*. but this has currently at least two show-stoppers: dbi has no
> oracle support, and time (too many ns_db calls scattered all over the
> code, and ns_db has a very wide interface).
>
> i guess you refer by "legacy sets" to the added option "-result sets".
> i've done this to cope with legacy code of the following form:
>
>          while { [::db_getrow $db $answers] } {
>              ...
>              ... large body doing something with the ns_set $answers
>              ...
>          }
>
> To keep the code working with the old ns_db interface and to support as
> well additionally dbi, having "dbi_rows" to return a list of ns_sets
> comes very handy. One can replace the ns_set collecting code by dbi, but
> one does not have to rewrite the code extracting values from the ns_sets.
>
> One could certainly as well use the dicts option for dbi_* and covert
> the list of dicts to a list of ns_sets in tcl, which is certainly much
> slower (same with the flat result list).


How about introducing a _db_foreach helper to replace the
stereotypical while loop outlined above -- something with an interface
more amenable to talking with both dbi and nsdb. Looking briefly at
db_foreach and db_multirow, it looks like some kind of helper could
clean up turning the sets into local variables without touching the
bulk of the code around it. Then sets could be avoided entirely when
using dbi.

By legacy I mean before dicts, sets were a useful alternative to the
weird by-name arrays, but not any more. And it looks like you're
focused on some kind of xo orm, so the bits under the hood which
connect db_* with dbi are to keep legacy code working as more stuff is
moved to xo. With that in mind, it seems a shame to re-surface sets in
the dbi api, to be used in code no one's going to see, especially when
there are alternatives.

> > 2) New generic xo functions
> >
> > I'm not following how the implementation as it is now, which returns a
> > list of lists with duplicated column names, is faster than just
> > returning a list of dicts. You have to feed the lists to dict create
> > before you use it, which is extra work and always going to be slower.
>
> i am not following you here. The print string of a tcl dict is a tcl
> list. there is no need to use "dict create". try:
>
>    dict get {a 1 b 2} b
>
> what i said in the comment was that adding elements to a list is faster
> than adding entries to the dict in the loop iterating over the columns.
> Furthermore, the list without the internal representation of the dict
> (essentially the hash table) requires less memory.

'dict get' is implemented by DictGetCommand:

    http://core.tcl.tk/tcl/artifact/cd2f2a05bbb75127

The first substantial thing it does is call TclTraceDictPath to
resolve the keys, and the first thing that does is call
SetDictFromAny, which uses the elements of the list to create the dict
internal representation.

So although it is faster to create nested lists than dicts, you create
the dicts lazily on first access anyway.

> > For any other purpose nested lists are no different than the original
> > flat list with column list, and don't address the problem of variable
> > clobbering.
> >
> > How about this:
> >
> >     dbi_dicts cols rows
> >
> > Returns a list of dicts by zipping up the cols and rows lists returned
> > by dbi_rows. It's faster than the current implementation as dbi_rows
> > only has to construct a flat list rather than the nested list with
> > duplicated keys, and that means less data to cache in ns_cache, and
> > parse again when deserialising.
>
> you say that
>
>      set results [dbi_rows -colums cols -- *SQL*]
>      set dicts [dbi_dicts $results $cols]
>
> is faster than
>
>      set dicts [dbi_rows -result dicts -- *SQL*]

No, this:

    set rows [dbi_rows -columns cols -- $sql]
    foreach d [dbi_dicts $cols $rows] {
        puts [dict get $d k]
    }

...is faster and uses less memory than this:

    foreach row [dbi_rows -result dicts -- $sql] {
        puts [dict get $row k]
    }

..the way you have it currently implemented with nested lists, because
$row shimmers to a dict on first access.

> hard to believe for me. the first form has to create an two more
> variables and has to iterate over the solutions twice.
> btw, the code in the tip behaves as ever when the option "-result" is
> not used.
> >
> > Generalising, it could be something like:
> >
> >   dbi_foreach ?-dict name | -array name | -set name? cols rows ?body?
> >
> > I avoided adding the obvious implementation of this because it
> > encourages stuff like nested queries. With dbi_rows you either get a
> > list result or a template filled with subst, but no eval. There's only
> > ever one query active and no visible handle management.
>
> what is the problem with a dbi_foreach in the dbi infrastructure of the
> form:
>
>     dbi_foreach ?-db name? ?-autonull? ?-timeout t? ?-bind bindSource?
> ?-max nrows? ?--? query ?body?
>
> i've prototyped such a thing, and it seems to work fine with nested
> queries.

Because client-side nested queries are a bad idea in a web server
environment and it was an explicit design goal of nsdbi to prevent
them.

One problem is that sql newbies tend to fall back on procedural code
in the client rather than figure out how to write the correct sql.

Another problem is the poor performance of n sub-queries, one for each
row in the driver loop.

Another is that two db backends and two handles are tied up for as
long as it takes the last (slow) sub-query to complete.

> i would not be surprised if there would problems with
> transactions, but i see no reason these can be adressed as well.
>
> dbi_foreach allows break-able loops, where one can step over a query
> with potentially huge amount of results and stop depending on the result
> of an tcl-call.

This is almost always a bad idea because:

- it encourages people to write the wrong sql and fix it up in tcl

- with reasonably sized result sets the query runs to completion in
the database anyway (unless you explicitly use cursors) and transfers
to the client in batches, not row by row, so an early break saves you
nothing

- you need to run code to break, and by allowing code to run you also
allow sub-queries, or even just slow code which keeps the handle busy

nsdbi even has a -max option and a default config of 1000 rows to
guard against faulty queries blowing up the web server (which is the
usual situation with large result sets) and force people to think
about what sort of result they're expecting. This is the n-rows to the
1row and 0or1row.

The remaining use case is bulk loading and extraction, and that's
probably not best handled by a front-end web server. Or at least,
there's probably a solution which doesn't undo the nice properties of
the api which are helpful day to day.

> There is no need to get all results first.
>
>     dbi_foreach *SQL* {
>        ....
>         if {[some-tcl-call $a $b $c]} { break }
>         ...
>     }
>
> How can this be handled with the exising dbi driver? i've no problem
> dropping the code in favor of some better implementation.

Another downside of foreach is that it clobbers local variables, which
was the original motivation for adding dicts. (dbi_rows with a
template doesn't have this problem).

Due to the above points, 'foreach' is probably not the right word, but
maybe something like dbi_zip (like the functional zip), working as
outlined originally with $cols and $rows, would be a tool you could
use to implement an xo orm.

> PS: using the OpenACS based xowiki with dbi (and the functions in
> mercurial head) lead to a 20% performance improvement with very little
> changes.

Neat.

I wonder if you'll notice improvements under load, in real life
conditions. The original micro benchmarks were impressive, but there's
also the effects of handle management. You have a choice between fewer
back-ends, due to shorter lifetimes, or more with the 1-1 conn-thread
to db backend mode, if you discover that on average every conn thread
uses a db backend.

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] commit/nsdbi: 3 new changesets

Reply via email to