Re: [FastBit-users] fileManager : file cacheing

K. John Wu Tue, 05 Apr 2011 13:37:10 -0700

Hi, Jon,

If you have an easy access to the corresponding ibis::column objects, 
then you can create a set of ibis::column::indexLock objects.  As long 
as you hold on to these indexLock objects, the indexes would remain in 
memory.


John

PS: If you really want, you can go down to ibis::fileManager level and 
explicitly call to

ibis::fileManager::getFile(const char*, storage**, ACCESS_PREFERENCE)

and then call ibis::storage::beginUse on the storage object returned 
from getFile.


On 4/5/11 1:14 PM, Jon Strabala wrote:
> John
>
> Can you selectively load (and pin) sets of indecies in memory 
> programatically. For example if I partition by day (sub directory wise) I 
> might want to keep the most recent six days in pinned RAM but free release 
> the indices that are seven days or older and let the normal underlying logic 
> manage old days, i.e. day 8 to day 1200.
>
> In the same vein I might run a series of queries 38 days ago on the same week 
> and for the entire series I might want to pin the indices in memory.
>
> Hopefully it is not an all or nothing mechanism.
>
> Thank
> Jon Strabala
>
> Sent from my iPhone
>
> On Apr 5, 2011, at 12:49 PM, "K. John Wu"<[email protected]>  wrote:
>
>> Hi, Mike,
>>
>> Thanks for your interest in FastBit.
>>
>> If you have enough memory on your machine, then you can load all
>> indexes into memory by calling ibis::part::loadIndexes with the second
>> argument readall set to a value greater than 0.  This should prevent
>> FastBit from ever attempt to read the index again.
>>
>> Seems like you have quite a lot of data, therefore, you might not want
>> to force all the indexes to be loaded into memory.  In this case, you
>> are probably looking for an option that will memory map the whole
>> index file.  This can be accomplished by setting parameter
>>
>> preferMMapIndex = true
>>
>> This parameter can be set in a parameter file or can be added in a
>> program by calling
>>
>> ibis::gParameters().add("preferMMapIndex", "true");
>>
>> before performing queries.
>>
>> It might be helpful to set the default option in FastBit to use memory
>> map.  I will do some experiments and see how to best handle the
>> default option.
>>
>> I am not sure I understand your queries enough to give you any useful
>> advices.  Typically, if your small queries can be combined into larger
>> ones, FastBit needs to do less work such as reading files and parsing
>> queries.  However, this it not always the case.  To be more specific,
>> I would need to understand your queries a little better.
>>
>> John
>>
>>
>> On 4/5/11 9:18 AM, Chong, Michael wrote:
>>>
>>>
>>> Dear Dr. Wu,
>>>
>>> I have been using your FastBit program for a few months now and
>>> have finally got to understand it a bit better. I use it in
>>> economics research looking through large real-time historical
>>> datasets. I like it a lot, and I have a Java JNI interface which we
>>> put together. The speed is really impressive :).
>>>
>>> I have structured the data into three tables, and further divided
>>> the data into a partition for each day; so 10 days data will have
>>> 30 partitions. The partition sizes are 12G, 2.6Megs, and 3.6G bytes
>>> for a days worth. Some days you have more data. Each day is
>>> represented by a three partition set.
>>>
>>> Right now to run for calculations for a day it takes about 120
>>> minutes. I am trying to speed this up and have some questions. The
>>> code runs through the data, performing calculations for a set of N
>>> events. For each event I run a "select" on the three partitions for
>>> a day.
>>>
>>> 1)      I noticed that the fileManager caches files. But it only
>>> seems to do so for the "select clause variables" and not for the
>>> "where clause variables". When I run a strace on Linux, I see it
>>> opening and then mmaping the "where clause files" again and again
>>> for each select. I have also tried to GetFile the "where clause
>>> files", but noticed that the nacc, nref and last used times are
>>> never incremented. Do the where clause variables ever hit the
>>> fileManager? I also never see a hit on a XXX.idx (index file)
>>> either. But I guess it must be using indexes somehow. I suspect my
>>> slow speed is due to the opening and closing of the same file. I
>>> can open a roFile in the fileManager, but this never gets any hits
>>> for some reason, so makes no difference to the speed.
>>>
>>> 2)      In general is it better to run one big Query and suck (A)
>>> out a large chunk of data, or (B) run lots of little queries? (B)
>>> is easier to code, but would (A) be faster. I was thinking of
>>> selecting say 100 events (an Event-Chunk" into an in memory 3 table
>>> set, then do another select on this "Event-Chunk". Is there a
>>> better way to do this using Fastbit?
>>>
>>> Many thanks for your kind help and advice.
>>>
>>> Warmest regards, Mike.
>>>
>>> DISCLAIMER: This e-mail message and any attachments are intended
>>> solely for the use of the individual or entity to which it is
>>> addressed and may contain information that is confidential or
>>> legally privileged. If you are not the intended recipient, you are
>>> hereby notified that any dissemination, distribution, copying or
>>> other use of this message or its attachments is strictly
>>> prohibited. If you have received this message in error, please
>>> notify the sender immediately and permanently delete this message
>>> and any attachments.
>>>
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] fileManager : file cacheing

Reply via email to