I checked, Rucksack writes dirty objects to disk on each commit - so
should have the same performance profile as our existing backends, of
course with different constants...
Ian
On May 16, 2008, at 11:26 AM, Ian Eslick wrote:
I just wanted to clarify a few points in this interesting discussion.
but if we are using high-level language like Lisp, it's much more
likely that CPU overhead will be large comparing to memory latency.
and as compression, even simpiest/fastest one will only add to CPU
overhead, i'm very skeptical about it's benefits.
I doubt that using a Lisp with a modern compiler affects CPU time in
any interesting way; I've observed that algorithm implementation
with reasonable data structure choices is within 10's of % points of
the same algorithm in C. I don't care how fast your CPU is, it's
the data access patterns that dominate time unless you're working on
small or highly local datasets, then the inner loop speed _might_
matter.
by the way, interesting fact: some time ago i've found that even
with cache enabled in elehant/postmodern
slots reads are relatively slow -- on 10 thousands per second scale
or so. it was quite surprising
that abusing function is form-slot-key (that is used in CL-SQL too):
At least in Allegro, the profiler only counts thread time and that
much of the delay shown by 'time' vs. the profiler is taken up
waiting on cache-miss IO. It may be that thread time is only a
handful of percentage points of the total time?
(defun form-slot-key (oid name)
(format nil "~A ~A" oid name))
Format is notoriously slow. Still, I find it hard to believe that
it matters that much. (See above)
and even more surprising was to find out that there is no portable
and fast way to convert interger to string.
so i had to use SBCL's internal function called sb-impl::quick-
integer-to-string -- apparently SBCL
developers knew about this problem so they made this function for
themselves.
For fun, I did a test with princ and for integers only it's 2x
faster, for the above idiom it's only about 50% faster, but some of
that is the with-string-stream idea. If you do it manually as they
did and write it to an array one character at a time it can get
pretty fast, but probably only 3x-4x.
so i think that subtle effects like I/O bandwidth will be only seen
in we'll hardcore optimize elephant and applications themselves.
but as it is now, storage backend won't be of a big influence of
overal performance and just needs to be moderately good
I agree with this in general. However I feel like this whole
discussion is premature optimization being done while blindfolded.
I do think that over time it would be nice to give people tools like
compression to enable/disable for their particular applications, but
it's probably not worth all the debate until we solve some of the
bigger problems in our copious free time.
By the way, if you are caching your data in memory and using the
most conservative transaction model of BDB (blocking commits) you
are write bandwidth limited. At the end of each little transaction,
you have to write a log entry to disk, then flush a 16kb page to
disk, then wait for the controller to say OK, then write the log
again, then return to your regularly scheduled programming. In
prevalence, you only write the log and occasionally dump to disk. I
suspect that the SQL back ends function similarly. When you finish
a transaction you have to wait for this to all happen over a socket...
Prevalence is much, much faster because you don't have to flush data
structures on each commit, so cl-prevalence performance with
Elephant's data and transaction abstractions would be a really nice
design point. I wonder if we would get some of this benefit from a
Rucksack adaptation?
Ian
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel