Peter,
which part of your changes do fix the problem with updatedable shredding of
large XML documents as reporten in
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
?
The new has string function in gdk_atoms.mx or the file descriptor fixes in
gdk_posix.mx?
The former looks for like a performance fix to me --- too many collisions
should only slows the system down, but not copromize its
fucntionallity/correctness, right?
Also with the new string has functions ("too") many collisions can still
occur with certain datasets ...
Stefan
On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
> Update of /cvsroot/monetdb/MonetDB/src/gdk
> In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
>
> Modified Files:
> Tag: MonetDB_1-20
> gdk_atoms.mx gdk_posix.mx
> Log Message:
>
> [checkin on behalf of Peter]
>
> fixing XQuery bug
> [ 1811229 ] [ADT] Adding large document, with update support
> http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
>
> gdk_atoms.mx:
> - hash collisions in strings that consists of digits only (a common case!)
> we now use a fast derivative of the Bob Jenkins function from now on
>
> Really bad collisions, in case of the 20GB document of the bug report,
> shredding took 8 hours before, 1 hour after this change.
>
> NOTE: this change affects the binary format (string heaps) and all product
> families, as the hash function is a compiled-in macro!
> In particular, lookup operations and joins on SQL (Monet4/5) columns
> consisting of digits only, but stored in a VARCHAR, should be faster
> after this check-in.
>
> gdk_posix.mx
> - we lost track of the file descriptor for large heaps (the file desc is given
> to the mmap-monitoring-thread to close later), such that the remap function
> could fail (when it was given the illegal file descriptor 0)
>
> NOTE: this change only affects xquery it only uses remap()
>
>
> Index: gdk_posix.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v
> retrieving revision 1.143
> retrieving revision 1.143.2.1
> diff -u -d -r1.143 -r1.143.2.1
> --- gdk_posix.mx 4 Sep 2007 17:55:20 -0000 1.143
> +++ gdk_posix.mx 14 Oct 2007 20:31:33 -0000 1.143.2.1
> @@ -615,7 +615,7 @@
> MT_mmap_tab[i].writable = writable;
> MT_mmap_tab[i].fd = fd;
> MT_mmap_tab[i].pincnt = 0;
> - fd = -1;
> + fd = -fd;
> }
> (void) pthread_mutex_unlock(&MT_mmap_lock);
> return fd;
> @@ -1051,9 +1051,7 @@
> }
> if (ret != (void *) -1L) {
> hdl->fixed = ret;
> - fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
> - if (fd <= 0)
> - hdl->hdl = (void *) 0; /* MT_mmap_new keeps the fd */
> + hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd,
> (mode & MMAP_WRITABLE));
> }
> return ret;
> }
> @@ -1061,13 +1059,12 @@
> void *
> MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len)
> {
> - void *ret;
> -
> - ret = mmap(hdl->fixed,
> + int fd = (int) (ssize_t) hdl->hdl;
> + void *ret = mmap(hdl->fixed,
> len,
> ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ,
> ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) :
> MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0),
> - (int) (ssize_t) hdl->hdl,
> + (fd < 0)?-fd:fd,
> off);
>
> if (ret != (void *) -1L) {
> @@ -1083,9 +1080,7 @@
> MT_mmap_close(MT_mmap_hdl *hdl)
> {
> int fd = (int) (ssize_t) hdl->hdl;
> -
> - if (fd)
> - close(fd);
> + if (fd > 0) close(fd);
> hdl->hdl = NULL;
> }
>
>
> Index: gdk_atoms.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v
> retrieving revision 1.134
> retrieving revision 1.134.6.1
> diff -u -d -r1.134 -r1.134.6.1
> --- gdk_atoms.mx 2 May 2007 16:16:58 -0000 1.134
> +++ gdk_atoms.mx 14 Oct 2007 20:31:32 -0000 1.134.6.1
> @@ -1878,13 +1878,19 @@
> rotates all characters together. It is optimized to process 2 characters
> at a time (adding 16-bits to the hash value each iteration).
> @h
> -#define GDK_STRHASH(x,y) { \
> - str _c = (str) (x); \
> - for((y)=0; _c[0] && _c[1]; _c+=2) { \
> - (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] << 8)
> ^ _c[0];\
> - } \
> - (y) ^= _c[0]; \
> +#define GDK_STRHASH(x,y) {\
> + str _key = (str) (x);\
> + int _i;\
> + for (_i = y = 0; _key[_i]; _i++) {\
> + y += _key[_i];\
> + y += (y << 10);\
> + y ^= (y >> 6);\
> + }\
> + y += (y << 3);\
> + y ^= (y >> 11);\
> + y += (y << 15);\
> }
> +
> @c
> hash_t
> strHash(str s)
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Monetdb-checkins mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
--
| Dr. Stefan Manegold | mailto:[EMAIL PROTECTED] |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Monetdb-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-developers