Hi John,

I found the issue in the source code for "buildIndex.cpp." It appears that
the maxBytes argument is hard-coded in this program:

ibis::gParameters().add("fileManager.maxBytes", "2GB"); (line 116,
buildIndex.cpp)


I increased maxBytes to "100GB":

ibis::gParameters().add("fileManager.maxBytes", "100GB");


This change seems to have allowed the program to proceed without the error
that I encountered previously. However, the amount of memory consumed by
the program to build the index is very large (currently at 203 GB of RAM
and growing).

The bigger issue is that the file that I submitted previously is only a
sample of the data that I am attempting to index. The data that I want to
index has a value range from 0 to 2^32-1 (the max value of an unsigned
integer). When I try to index my data from the full dataset, I am
encountering a new error:

*terminate called after throwing an instance of 'char const*'*


Using gdb, I tracked this exception to a line (1469) in "array_t.cpp." This
line is part of the array_t::resize function:

*array_t.cpp*

template<class T>
void ibis::array_t<T>::resize(size_t n) {
    if (n > 0x7FFFFFFFU) {
        throw "array_t must have less than 2^31 elements";
    }



There seems to be a requirement that array_t objects must have less than
2^31 elements. Some of the data in my full dataset violates this
restriction.

This leads me to two questions which will determine if I can use
FastBit/FastQuery in my project:

1) I'm concerned about the use of RAM for building the index. I see mention
in the source code of using a memory map within the ibis::fileManager
class. Can I force FastQuery's IndexBuilder class to use this memory map
functionality to reduce the amount of RAM used when building an index? If
so, how can I do this? Alternatively, is there some other method for
reducing the amount of memory needed to build a large index.

2) Is there a practical reason why array_t objects cannot contain more than
2^31 elements?

Thank you for your help.

Best,
Darryl




On Tue, Jul 29, 2014 at 12:15 PM, Darryl Reeves <[email protected]> wrote:

> Hi John,
>
> Thank you for the guidance. Unfortunately, it seems as though the program
> is not recognizing my value for maxBytes (. The configuration file appears
> as though it is being read but the maxBytes value is not the same as the
> one that I have provided. (Seems to still be the default value):
>
> darryl@hippocampus data $
> ~/downloads/fastquery-0.8.2.8/examples/buildIndex -c ibis.rc -f sample.h5
> -n k -v 2
>
> FastBit ibis1.3.8
> Log started on Tue Jul 29 12:07:28 2014
> resource::read -- parsing configuration file "ibis.rc"
> /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex data file
> "sample.h5"      with 1 variable name ...
> fileManager initialization complete -- maxBytes=2147485648,
> maxOpenFiles=768
> FastQuery constructor invoked with datafileName=sample.h5, fileFormat=0,
> readOnly=0
> /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex initiate the
> IndexBuilder object for file "sample.h5"
> Warning -- HDF5::__getDatasetId(/sample/k.bitmapKeys): dataset does not
> exist
> Warning -- HDF5::__getDatasetDimension(/sample/k.bitmapKeys): failed to
> open the dataset
> Warning -- HDF5::getBitmapKeyLength(/sample/k): failed to get bitmap keys
> length
> FQ_IndexUnbinned[/sample/k]::readOld: no existing indexes to read in file
> "sample.h5"
> FQ_Variable::getValuesArray the nElements size is 1643204076
> Warning -- fileManager::storage::ctor unable to find 6,572,816,304 bytes
> of space in memory
> terminate called after throwing an instance of 'ibis::bad_alloc'
>   what():  storage::ctor(memory) failed
> Aborted
>
> My configuration file is attached. Is there something that I am not doing
> incorrectly?
>
> Thanks,
> Darryl
>
>
> On Tue, Jul 29, 2014 at 12:38 AM, K. John Wu <[email protected]> wrote:
>
>> Hi, Darryl,
>>
>> Ahh, the usage note from that program happens to be missing the -c
>> option.  You should be able to use '-c ibis.rc'.  One way to know
>> whether you have set maxBytes to the value you want is to specify '-v
>> 2' on the same command line and then examine the output message for
>> line with
>>
>> fileManager initialization complete -- maxBytes=dddd
>>
>> Hope this helps.
>>
>> John
>>
>>
>>
>> On 7/28/14 8:53 PM, Darryl Reeves wrote:
>> > Hi John,
>> >
>> > I am using a program that is included with fastquery in the "examples"
>> > directory called "buildIndex." It doesn't appear to have a -c option,
>> > however:
>> >
>> >
>> >
>> > fastquery-0.8.2.8/examples/buildIndex -f data-file-name [-i
>> > index-file-name] [-g log-file] [-n variable-name] [-p variable-path] [-b
>> > '<binning nbins=1000 />' (default unbinned)] [-r (force-rebuild-index)]
>> [-v
>> > verboseness] [-m fileFormat [HDF5(default), H5PART, NETCDF, PNETCDF]]
>> [-l
>> > mpi_subarray_size(default=100000)]
>> >         It builds index for a set of variables whose dataset location
>> has
>> > the prefix
>> >         variable-path and postfix variable-name.
>> >
>> >         Use option "-i" to specify the output file for storing indexes.
>> >         Otherwise, the indexes are written back to data file
>> > "data-file-name".
>> >
>> >         Use option "-r" to enforce rebuild and replace the existing
>> index.
>> >
>> >         Under parallel mode, use "-l" to set the subarray size for
>> spitting
>> > dataset.
>> >
>> >         Use option "-b" to specify the binning option to build the
>> index.
>> >         The available binning option is defined and provided by the
>> FastBit.
>> >         More information can be found at
>> > http://crd.lbl.gov/~kewu/fastbit/doc/indexSpec.html.
>> >         Binning option is suggested to be used with large dataset to
>> reduce
>> > the size of built index.
>> >         Precision option is suggested to be used when the query involves
>> > floating point numbers.
>> >
>> >
>> > On Mon, Jul 28, 2014 at 11:48 PM, K. John Wu <[email protected]> wrote:
>> >
>> >> Hi, Darryl,
>> >>
>> >> Thanks for your patience.
>> >>
>> >> What program are using using?  One of our programs or your own
>> >> program?  With most of the example programs we provide, there is an
>> >> option -c, which is for you to specify the configuration file.
>> >>
>> >> John
>> >>
>> >>
>> >>
>> >>
>> >> On 7/28/14 3:50 PM, Darryl Reeves wrote:
>> >>> My apologies. I should have continued reading the documentation. I
>> found
>> >> a
>> >>> sample configuration file here:
>> >>>
>> >>> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html#samplerc
>> >>>
>> >>> Unfortunately, even when I set the value to 1000000Mb
>> >>> (fileManager.maxBytes=1000000Mb), I still get the same error
>> originally
>> >>> reported.
>> >>>
>> >>>
>> >>> On Mon, Jul 28, 2014 at 1:29 PM, Darryl Reeves <[email protected]>
>> >> wrote:
>> >>>
>> >>>> Hi John,
>> >>>>
>> >>>> After reading your response on a different thread ("Out of memory
>> >> without
>> >>>> MMAP"), I'm thinking that increasing the fileManager cache might
>> help me
>> >>>> with this problem as well. From the online documentation, I was able
>> to
>> >>>> determine that I can set the cache size in a file named "ibis.rc"
>> that
>> >> is
>> >>>> located in the same directory where I am running "buildIndex." What
>> >> should
>> >>>> the format of the configuration parameter be?
>> >>>>
>> >>>> I tried:
>> >>>>
>> >>>> fileManager.maxBytes = 5GB
>> >>>>
>> >>>> But this doesn't seem to make a difference. Can you provide an
>> example
>> >>>> configuration file?
>> >>>>
>> >>>> Thanks,
>> >>>> Darryl
>> >>>>
>> >>>>
>> >>>> On Fri, Jul 25, 2014 at 7:02 PM, Darryl Reeves <[email protected]>
>> >> wrote:
>> >>>>
>> >>>>> Hi John,
>> >>>>>
>> >>>>> I just shared the file with you through my Google Drive account.
>> Let me
>> >>>>> know if you have any problems accessing the file.
>> >>>>>
>> >>>>> The command that I have tried to run is:
>> >>>>>
>> >>>>> fastquery-0.8.2.8/examples/buildIndex -f sample.2.h5 -n k
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Darryl
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jul 24, 2014 at 10:59 PM, John Wu <[email protected]> wrote:
>> >>>>>
>> >>>>>> Hi, Darryl,
>> >>>>>>
>> >>>>>> Are you able to share the test data with us?  It would be useful
>> for
>> >> us
>> >>>>>> to reproduce the problem.
>> >>>>>>
>> >>>>>> If yes, please provide instructions on reproducing the problem.
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>>
>> >>>>>> K Wu
>> >>>>>> On Jul 24, 2014 1:25 AM, "Darryl Reeves" <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>>> Hello,
>> >>>>>>>
>> >>>>>>> I am attempting to index a file consisting of unsigned integer
>> data
>> >>>>>>> using the example program "buildIndex" included with fastquery.
>> This
>> >>>>>>> program works fine for the test data included with the source
>> code.
>> >>>>>>>
>> >>>>>>> However, when I try to index my data, I receive the following
>> memory
>> >>>>>>> allocation error:
>> >>>>>>>
>> >>>>>>> Warning -- fileManager::storage::ctor unable to find 6,572,816,304
>> >>>>>>> bytes of space in memory
>> >>>>>>> terminate called after throwing an instance of 'ibis::bad_alloc'
>> >>>>>>>   what():  storage::ctor(memory) failed
>> >>>>>>>
>> >>>>>>> This error is revealed very early in the execution of the
>> program. By
>> >>>>>>> monitoring the memory being used, I can tell that only 58756 Kb
>> have
>> >> been
>> >>>>>>> allocated by the program. The server where this is running has no
>> >> shortage
>> >>>>>>> of RAM available. I am willing to share the dataset that I am
>> using
>> >> but it
>> >>>>>>> is 3.2 Gb, so it cannot be transferred as part of this message.
>> >>>>>>>
>> >>>>>>> Any help that can you can provide to figure out this issue is
>> >>>>>>> appreciated.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Darryl Reeves
>> >>>>>>> Ph.D. Candidate
>> >>>>>>> Mason Lab
>> >>>>>>> Weill Cornell Medical College of Cornell University
>> >>>>>>> Tri-Institutional Program in Computational Biology and Medicine
>> >>>>>>>
>> >>>>>>> _______________________________________________
>> >>>>>>> FastBit-users mailing list
>> >>>>>>> [email protected]
>> >>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> >>>>>>>
>> >>>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> FastBit-users mailing list
>> >>>>>> [email protected]
>> >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Darryl Reeves
>> >>>>> Ph.D. Candidate
>> >>>>> Mason Lab
>> >>>>> Weill Cornell Medical College of Cornell University
>> >>>>> Tri-Institutional Program in Computational Biology and Medicine
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Darryl Reeves
>> >>>> Ph.D. Candidate
>> >>>> Mason Lab
>> >>>> Weill Cornell Medical College of Cornell University
>> >>>> Tri-Institutional Program in Computational Biology and Medicine
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> FastBit-users mailing list
>> >>> [email protected]
>> >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> >>>
>> >> _______________________________________________
>> >> FastBit-users mailing list
>> >> [email protected]
>> >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> >>
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > FastBit-users mailing list
>> > [email protected]
>> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> >
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>
>
>
> --
> Darryl Reeves
> Ph.D. Candidate
> Mason Lab
> Weill Cornell Medical College of Cornell University
> Tri-Institutional Program in Computational Biology and Medicine
>



-- 
Darryl Reeves
Ph.D. Candidate
Mason Lab
Weill Cornell Medical College of Cornell University
Tri-Institutional Program in Computational Biology and Medicine
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to