Hi, Jon, If you have an easy access to the corresponding ibis::column objects, then you can create a set of ibis::column::indexLock objects. As long as you hold on to these indexLock objects, the indexes would remain in memory.
John PS: If you really want, you can go down to ibis::fileManager level and explicitly call to ibis::fileManager::getFile(const char*, storage**, ACCESS_PREFERENCE) and then call ibis::storage::beginUse on the storage object returned from getFile. On 4/5/11 1:14 PM, Jon Strabala wrote: > John > > Can you selectively load (and pin) sets of indecies in memory > programatically. For example if I partition by day (sub directory wise) I > might want to keep the most recent six days in pinned RAM but free release > the indices that are seven days or older and let the normal underlying logic > manage old days, i.e. day 8 to day 1200. > > In the same vein I might run a series of queries 38 days ago on the same week > and for the entire series I might want to pin the indices in memory. > > Hopefully it is not an all or nothing mechanism. > > Thank > Jon Strabala > > Sent from my iPhone > > On Apr 5, 2011, at 12:49 PM, "K. John Wu"<[email protected]> wrote: > >> Hi, Mike, >> >> Thanks for your interest in FastBit. >> >> If you have enough memory on your machine, then you can load all >> indexes into memory by calling ibis::part::loadIndexes with the second >> argument readall set to a value greater than 0. This should prevent >> FastBit from ever attempt to read the index again. >> >> Seems like you have quite a lot of data, therefore, you might not want >> to force all the indexes to be loaded into memory. In this case, you >> are probably looking for an option that will memory map the whole >> index file. This can be accomplished by setting parameter >> >> preferMMapIndex = true >> >> This parameter can be set in a parameter file or can be added in a >> program by calling >> >> ibis::gParameters().add("preferMMapIndex", "true"); >> >> before performing queries. >> >> It might be helpful to set the default option in FastBit to use memory >> map. I will do some experiments and see how to best handle the >> default option. >> >> I am not sure I understand your queries enough to give you any useful >> advices. Typically, if your small queries can be combined into larger >> ones, FastBit needs to do less work such as reading files and parsing >> queries. However, this it not always the case. To be more specific, >> I would need to understand your queries a little better. >> >> John >> >> >> On 4/5/11 9:18 AM, Chong, Michael wrote: >>> >>> >>> Dear Dr. Wu, >>> >>> I have been using your FastBit program for a few months now and >>> have finally got to understand it a bit better. I use it in >>> economics research looking through large real-time historical >>> datasets. I like it a lot, and I have a Java JNI interface which we >>> put together. The speed is really impressive :). >>> >>> I have structured the data into three tables, and further divided >>> the data into a partition for each day; so 10 days data will have >>> 30 partitions. The partition sizes are 12G, 2.6Megs, and 3.6G bytes >>> for a days worth. Some days you have more data. Each day is >>> represented by a three partition set. >>> >>> Right now to run for calculations for a day it takes about 120 >>> minutes. I am trying to speed this up and have some questions. The >>> code runs through the data, performing calculations for a set of N >>> events. For each event I run a "select" on the three partitions for >>> a day. >>> >>> 1) I noticed that the fileManager caches files. But it only >>> seems to do so for the "select clause variables" and not for the >>> "where clause variables". When I run a strace on Linux, I see it >>> opening and then mmaping the "where clause files" again and again >>> for each select. I have also tried to GetFile the "where clause >>> files", but noticed that the nacc, nref and last used times are >>> never incremented. Do the where clause variables ever hit the >>> fileManager? I also never see a hit on a XXX.idx (index file) >>> either. But I guess it must be using indexes somehow. I suspect my >>> slow speed is due to the opening and closing of the same file. I >>> can open a roFile in the fileManager, but this never gets any hits >>> for some reason, so makes no difference to the speed. >>> >>> 2) In general is it better to run one big Query and suck (A) >>> out a large chunk of data, or (B) run lots of little queries? (B) >>> is easier to code, but would (A) be faster. I was thinking of >>> selecting say 100 events (an Event-Chunk" into an in memory 3 table >>> set, then do another select on this "Event-Chunk". Is there a >>> better way to do this using Fastbit? >>> >>> Many thanks for your kind help and advice. >>> >>> Warmest regards, Mike. >>> >>> DISCLAIMER: This e-mail message and any attachments are intended >>> solely for the use of the individual or entity to which it is >>> addressed and may contain information that is confidential or >>> legally privileged. If you are not the intended recipient, you are >>> hereby notified that any dissemination, distribution, copying or >>> other use of this message or its attachments is strictly >>> prohibited. If you have received this message in error, please >>> notify the sender immediately and permanently delete this message >>> and any attachments. >>> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
