[REBOL] Find speed Re:

Galt_Barber Wed, 05 Jul 2000 13:16:07 -0700



Ladislav,

your binary search code (bfind) runs faster than find,
but find works on unsorted data, so it's not optimized that way.

Hashes also dont work well with sorted data, they are great
for finding you a person given a social security number,
but hashes are not good for walking a list of people
in order by their ssn or name.

So I am not surprised that your bfind runs faster than find.

Do you think RT should add /ordered or something to 'find
when the data is pre-sorted so that it could use binary search internally?

If you want to create really really fast large capacity data
that is fast to search and insert/delete/modify and also be
able to walk in say order of the key (e.g. lastname), then
you will probably end up using B+ trees with nodes made of blocks
and use your binary search or something like it to search individual
blocks for nearest key match.

When you try to stick everything in one block eventually you will
hit a point with larger data where rebol is having a hard time with
a single block (unless rebol is more clever than I think and more clever
in this regard than any language I have ever seen).

I mentioned something about this awhile ago but there seemed to
be little interest in it then.  I suppose many people would say that
for large systems you need a multi-user, transaction protected,
database system like Oracle$$$ or MS-SQLServer, etc.

-----------

By the way, Tim mentioned ODBC access to his database C/Mix (?).
Well, the c++ source is going to have access to odbc dlls and stuff that
Rebol/Core and /View will not be able to do.  And command will probably
have ODBC built right in, so ...

-----

Also, people are recommending re-writing the database code in Rebol,
which is fine I suppose.  One might want to choose new datastructures
which are best for rebol, but if his goal is file-format compatibility, then
he will not be free to change the data structures on disk, which may
make certain Rebol optimizations impossible.

If Tim just wants to make a single-user database that his web-app uses
to store data for itself, then he probably could implement something
decent that works without having to port the entire functionality of this
commercial product which is probably loaded with extra stuff he doesn't need.

For most purposes, you can get away with a minimal system employing
fixed-length records and an index of some kind for speed.
Given that, you can do most basic things you would need.  Even writing a little
db engine that did that much would take you a while, but it would be a lot
easier than porting hundreds or thousands of pages of c-code.

----

Concerning the copyright of the commercial c-code,
As far as I know, even if it is a translation to another language, the copyright
of the original is maintained.  This is certainly true of books.  If you
translate
The Little Prince to Cherokee or some other language, the original author's
copyright still exists and you won't be able to sell and distribute the
translated
version without permission.  Your work as a translator is itself still
protected, too,
I suppose, but that doesn't negate the rights of the original author.

----------

So, if you decide to make your own db engine, what features do you
absolutely have to support?  Remember each one is going to cost you!

By the way, can I really open a very large file in Rebol without buffering and
jump right to anywhere
in the file and change some bytes?  It seemed like it didn't work for me the
last
time I was checking that out.  I either got buffering I didn't want or else I
couldn't skip to anyplace in the file.  It seems like most OSes must certainly
support random file access.  Does rebol have a problem with this?

---------

-galt
[REBOL] Find speed Re:

Reply via email to