pat wrote: > >From the 'jstat', this file has 45,883 frames that are 'full', > requiring a 'linked' frame : > > Total Frames = 175074 > > Total Frames - original modulo > > 175074 - 129191 = 45,883 linked frames > > So the jBASE 4.1.4.19 'recommendation' of '132087' is too small, even > for the existing records in this file > > Even with a perfect spread of items within the 132087 groups, giving > 21 items per group, > each 4096 byte frame has a header of 52 bytes and each item has a > header of 16 bytes, leaving 3708 bytes in each frame for the 'data' > [ this calculation also ( naively ) assumes that every item is of > identical length ] > > So for the 484,741,937 bytes of data currently in this file, with a > perfect spread of identically sized items, this will require a minimum > of 130729 frames > > Unfortunately, unless your item ids are purely numeric and range > sequentially from 1 thru 2761824, then Hash Method 3 will NOT give a > perfect spread of items throughout these groups > > So the jBASE 5.0 recommendation is a more realistic value for resizing > this file > > If the item ids are the usual combination of Alphanumeric characters, > then Hash method 5 will give a more even spread of items throughout > the groups, without any empty buckets, unless the modulo exceeds the > number of records within the file > > The correction to the simplistic and original jrf has been available > in all jBASE Releases since November 2006 > [ jBASE 4.1.4.19 was released in February 2006 ] > I see where you are going with that (but I don't think you have finished it by explaining HOW jrf now calculates its new modulo), but I think that this is still WAY more buckets than are required. Without using standard deviation, the assumptions above are just as naive as the original jrf escapee. In fact they suffer from naive assumptions about disk access and virtual memory, so they are worse really.
130729 is a reasonable size to start with, and if you change HASHMETHOD you will get closer to a perfect spread than crummy HASHMETHOD=3. However, the difference between 4.1's recommended 133,087 and j5's 453,949 is 321,862 buckets or nearly 3 times as many extra buckets as there were to start with in the original, which is about 1.2GB more space!! As the file before resizing occupied only 0.67GB (regardless of how it was internally organized), how can allocating disk space of about 1.8GB make any sense? Sure, I can see upping the buckets by more than the original jrf used to do, but at most I can only see adding say 50K frames (and that is just a guess). What argument are you using to justify that jBASE 5 jrf is correct here? The file has now increased in on-disk size by 1.2GB, which means the disk controllers and memory cache must also accommodate this. It is undoubtedly more efficient to have overflow frames than 1.2GB of empty space and jrf needs to take this kind of thing into account if it is going to try to be 'clever' - this is one of the reasons it did not used to try to be clever. So, I think that while the arguments for changing jrf are good (after all jrf was only a convenient way to avoid back-of-the-envelope calculations that anyone can do), it has gone way too far the other way (assuming that this is a typical calculation) and I think that massively increasing the load on virtual memory just doesn't make sense. Jim --~--~---------~--~----~------------~-------~--~----~ Please read the posting guidelines at: http://groups.google.com/group/jBASE/web/Posting%20Guidelines IMPORTANT: Type T24: at the start of the subject line for questions specific to Globus/T24 To post, send email to [email protected] To unsubscribe, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jBASE?hl=en -~----------~----~----~----~------~----~------~--~---
