pat wrote:
> >From the 'jstat', this file has 45,883 frames that are 'full',
> requiring a 'linked' frame :
>
>    Total Frames = 175074
>
> Total Frames - original modulo
>
>   175074 -  129191 = 45,883 linked frames
>
> So the jBASE 4.1.4.19 'recommendation' of '132087' is too small, even
> for the existing records in this file
>
> Even with a perfect spread of items within the 132087 groups, giving
> 21 items per group,
> each 4096 byte frame has a header of 52 bytes and each item has a
> header of 16 bytes, leaving 3708 bytes in each frame for the 'data'
> [ this calculation also ( naively ) assumes that every item is of
> identical length ]
>
> So for the 484,741,937  bytes of data currently in this file, with a
> perfect spread of identically sized items, this will require a minimum
> of 130729 frames
>
> Unfortunately, unless your item ids are purely numeric and range
> sequentially from 1 thru 2761824, then Hash Method 3 will NOT give a
> perfect spread of items throughout these groups
>
> So the jBASE 5.0 recommendation is a more realistic value for resizing
> this file
>
> If the item ids are the usual combination of Alphanumeric characters,
> then Hash method 5 will give a more even spread of items throughout
> the groups, without any empty buckets, unless the modulo exceeds the
> number of records within the file
>
> The correction to the simplistic and original jrf has been available
> in all jBASE Releases since November 2006
> [ jBASE 4.1.4.19 was released in February 2006 ]
>   
I see where you are going with that (but I don't think you have finished 
it by explaining HOW jrf now calculates its new modulo), but I think 
that this is still WAY more buckets than are required. Without using 
standard deviation, the assumptions above are just as naive as the 
original jrf escapee. In fact they suffer from naive assumptions about 
disk access and virtual memory, so they are worse really.

130729 is a reasonable size to start with, and if you change HASHMETHOD 
you will get closer to a perfect spread than crummy HASHMETHOD=3. 
However, the difference between 4.1's recommended 133,087 and j5's 
453,949 is 321,862 buckets or nearly 3 times as many extra buckets as 
there were to start with in the original, which is about 1.2GB more 
space!! As the file before resizing occupied only 0.67GB (regardless of 
how it was internally organized), how can allocating disk space of about 
1.8GB make any sense? Sure, I can see upping the buckets by more than 
the original jrf used to do, but at most I can only see adding say 50K 
frames (and that is just a guess).

What argument are you using to justify that jBASE 5 jrf is correct here? 
The file has now increased in on-disk size by 1.2GB, which means the 
disk controllers and memory cache must also accommodate this. It is 
undoubtedly more efficient to have overflow frames than 1.2GB of empty 
space and jrf needs to take this kind of thing into account if it is 
going to try to be 'clever'  - this is one of the reasons it did not 
used to try to be clever.

So, I think that while the arguments for changing jrf are good (after 
all jrf was only a convenient way to avoid back-of-the-envelope 
calculations that anyone can do), it has gone way too far the other way 
(assuming that this is a typical calculation) and I think that massively 
increasing the load on virtual memory just doesn't make sense.

Jim


--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: 
http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to 
Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to