Hi,

I was trying to troubleshoot some performance issues and i found some 
surprising results, maybe related to the fact that i'm not so familiar with 
FastBit.

I have a partition containing 2,749,086 rows. I'm selecting a category column 
with a quite selective where clause (430,906 hits over the 2,749,086 rows, 
roughly 15%).

After digging up a bit, i found that most of the time is spent at 
ibis::relic::keys, which was not really what i would have expected (so i added 
a timer to measure it):

relic::keys -- loop to generate ii took 8.250746 CPU seconds, 8.257573 elapsed 
seconds
doQuery:: evaluate(<query>) produced 430906 hits, took 8.71168 CPU seconds, 
8.72055 elapsed seconds

I was wondering what could be done to make this function faster.

My understanding is that currently, it goes through all the bitmaps for each 
distinct value (~37,000 for this column), bitwise-and it with the hit vector 
and collects values for each matching position.
What would you thing about some sort of index, similar to an actual uint column 
where each string corresponding value would be stored at its position.
This way, i think it would be possible to collect keys much faster, at speed 
similar to ibis::column::selectUInts.

To evaluate the potential speedup, i build a UINT column from the CATEGORY 
column, and selecting the UINT column instead of the CATEGORY one is MUCH 
faster (>20x):

doQuery:: evaluate(<query>) produced 430906 hits, took 0.388941 CPU seconds, 
0.38984 elapsed seconds

Do you think this is something big to implement ? Would it make sense for you 
to evaluate this ?

I'm waiting for your comments on this.

Thanks,

Dominique Prunier
 APG Lead Developper
[cid:[email protected]]
 4388, rue Saint-Denis
 Bureau 309
 Montreal (Quebec)  H2J 2L1
 Tel. +1 514-842-6767  x310
 Fax +1 514-842-3989
 [email protected]<mailto:[email protected]>
 www.watch4net.com<http://www.watch4net.com/>

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of this electronic mail by you is prohibited.

Ce message est pour le récipiendaire désigné seulement et peut contenir des 
informations privilégiées, propriétaires ou autrement privées. Si vous l'avez 
reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez 
l'original. Toute autre utilisation de ce courrier électronique par vous est 
prohibée.

<<inline: image001.gif>>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to