Re: jBASE unefficient? - distributed files

Jim Idle Fri, 20 Mar 2009 10:12:51 -0700

Pawel (privately) wrote:
> Hi,
>
> I would like to ask group members about jBASE performance in relation to 
> distributed files.
>
> We have noticed that SELECTs with criteria on distributed may be 4 times 
> slower than on regular files.
> I think that reason is quite clear: jBASE is querying each partfile and 
> then reading record from it to qualify it or throw it away from select 
> list.
>
> Reading of the record is however not clever process. Distributed routine 
> is run unnecessary for each record, which causes a lot of overhead.
> Does jBASE need to confirm that key taken from part file #1 is still in 
> part file #1? ;)
>
> We have raised it as performance bug. I would expect that jBASE reads 
> part files one by one and does not need to invoke distribution routine. 
> What do you think?
> Our suggestion relates also to part files scanning - it could be done in 
> separate processes too (just to speed up selection process).
>
> PS. I know that it is not clever to ask quries against distributed 
> files, but why not to optimize jBASE? :)
>
>   
The problem isn't so much that distributed files are inefficient but 
probably that the algorithm you are using is not optimal. What is the 
SELECT statement you are using? If you create a list then read through 
that list, you will read in the list order, which may not be optimal for 
that distribution. Note that many moons ago I modified the distributed 
files so that you could change the key on the fly to guarantee that the 
distribution is good (not that anyone has ever used except the people I 
wrote it for) Otherwise the key order you get is not necessarily the 
order that is best to read through the part files and you will create 
millions of random reads instead of lots of sequential reads.


A select will read the ID, then it will read the record - 
non-distributed files use some neat tricks for bulk reads, whereas 
distributed files probably cannot. Have you considered SELECTing each 
part file individually, then merging the results? I doubt that the 
distributed files can be generically optimized, but by changing the key 
on read and write (computationally of course) you can probably get much 
better performance.

Of course, once you can use my new file system, you won't need 
distributed files and won't have this problem :-)

Jim



--~--~---------~--~----~------------~-------~--~----~
Please read the posting guidelines at: 
http://groups.google.com/group/jBASE/web/Posting%20Guidelines

IMPORTANT: Type T24: at the start of the subject line for questions specific to 
Globus/T24

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: jBASE unefficient? - distributed files

Reply via email to