Hi, I was trying to troubleshoot some performance issues and i found some surprising results, maybe related to the fact that i'm not so familiar with FastBit.
I have a partition containing 2,749,086 rows. I'm selecting a category column with a quite selective where clause (430,906 hits over the 2,749,086 rows, roughly 15%). After digging up a bit, i found that most of the time is spent at ibis::relic::keys, which was not really what i would have expected (so i added a timer to measure it): relic::keys -- loop to generate ii took 8.250746 CPU seconds, 8.257573 elapsed seconds doQuery:: evaluate(<query>) produced 430906 hits, took 8.71168 CPU seconds, 8.72055 elapsed seconds I was wondering what could be done to make this function faster. My understanding is that currently, it goes through all the bitmaps for each distinct value (~37,000 for this column), bitwise-and it with the hit vector and collects values for each matching position. What would you thing about some sort of index, similar to an actual uint column where each string corresponding value would be stored at its position. This way, i think it would be possible to collect keys much faster, at speed similar to ibis::column::selectUInts. To evaluate the potential speedup, i build a UINT column from the CATEGORY column, and selecting the UINT column instead of the CATEGORY one is MUCH faster (>20x): doQuery:: evaluate(<query>) produced 430906 hits, took 0.388941 CPU seconds, 0.38984 elapsed seconds Do you think this is something big to implement ? Would it make sense for you to evaluate this ? I'm waiting for your comments on this. Thanks, Dominique Prunier APG Lead Developper [cid:[email protected]] 4388, rue Saint-Denis Bureau 309 Montreal (Quebec) H2J 2L1 Tel. +1 514-842-6767 x310 Fax +1 514-842-3989 [email protected]<mailto:[email protected]> www.watch4net.com<http://www.watch4net.com/> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of this electronic mail by you is prohibited. Ce message est pour le récipiendaire désigné seulement et peut contenir des informations privilégiées, propriétaires ou autrement privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez l'original. Toute autre utilisation de ce courrier électronique par vous est prohibée.
<<inline: image001.gif>>
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
