Thank you for the information, John. Best, Darryl
On Tue, Aug 12, 2014 at 7:39 PM, K. John Wu <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > Hi, Darryl, > > FastQuery code works by breaking your data set (in the HDF5 sense of > the word) into smaller subsets that are smaller than 2^31 > rows/elements each. This is a form of partitioning that is commonly > used in most data processing software. For example, each MPI process > might work on a partition of a large data set. Hope this makes sense > to you. > > John > > > > > On 7/29/14 11:01 AM, Darryl Reeves wrote: > > Hi John, > > > > I found the issue in the source code for "buildIndex.cpp." It appears > > that the maxBytes argument is hard-coded in this program: > > > > ibis::gParameters().add("fileManager.maxBytes", "2GB"); (line 116, > > buildIndex.cpp) > > > > > > I increased maxBytes to "100GB": > > > > ibis::gParameters().add("fileManager.maxBytes", "100GB"); > > > > > > This change seems to have allowed the program to proceed without the > > error that I encountered previously. However, the amount of memory > > consumed by the program to build the index is very large (currently at > > 203 GB of RAM and growing). > > > > The bigger issue is that the file that I submitted previously is only > > a sample of the data that I am attempting to index. The data that I > > want to index has a value range from 0 to 2^32-1 (the max value of an > > unsigned integer). When I try to index my data from the full dataset, > > I am encountering a new error: > > > > /terminate called after throwing an instance of 'char const*'/ > > > > > > Using gdb, I tracked this exception to a line (1469) in "array_t.cpp." > > This line is part of the array_t::resize function: > > > > *array_t.cpp* > > > > template<class T> > > void ibis::array_t<T>::resize(size_t n) { > > if (n > 0x7FFFFFFFU) { > > throw "array_t must have less than 2^31 elements"; > > } > > > > > > > > There seems to be a requirement that array_t objects must have less > > than 2^31 elements. Some of the data in my full dataset violates this > > restriction. > > > > This leads me to two questions which will determine if I can use > > FastBit/FastQuery in my project: > > > > 1) I'm concerned about the use of RAM for building the index. I see > > mention in the source code of using a memory map within the > > ibis::fileManager class. Can I force FastQuery's IndexBuilder class to > > use this memory map functionality to reduce the amount of RAM used > > when building an index? If so, how can I do this? Alternatively, is > > there some other method for reducing the amount of memory needed to > > build a large index. > > > > 2) Is there a practical reason why array_t objects cannot contain more > > than 2^31 elements? > > > > Thank you for your help. > > > > Best, > > Darryl > > > > > > > > > > On Tue, Jul 29, 2014 at 12:15 PM, Darryl Reeves <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi John, > > > > Thank you for the guidance. Unfortunately, it seems as though the > > program is not recognizing my value for maxBytes (. The > > configuration file appears as though it is being read but the > > maxBytes value is not the same as the one that I have provided. > > (Seems to still be the default value): > > > > darryl@hippocampus data $ > > ~/downloads/fastquery-0.8.2.8/examples/buildIndex -c ibis.rc -f > > sample.h5 -n k -v 2 > > > > FastBit ibis1.3.8 > > Log started on Tue Jul 29 12:07:28 2014 > > resource::read -- parsing configuration file "ibis.rc" > > /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex data > > file "sample.h5" with 1 variable name ... > > fileManager initialization complete -- maxBytes=2147485648 > > <tel:2147485648>, maxOpenFiles=768 > > FastQuery constructor invoked with datafileName=sample.h5, > > fileFormat=0, readOnly=0 > > /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex > > initiate the IndexBuilder object for file "sample.h5" > > Warning -- HDF5::__getDatasetId(/sample/k.bitmapKeys): dataset > > does not exist > > Warning -- HDF5::__getDatasetDimension(/sample/k.bitmapKeys): > > failed to open the dataset > > Warning -- HDF5::getBitmapKeyLength(/sample/k): failed to get > > bitmap keys length > > FQ_IndexUnbinned[/sample/k]::readOld: no existing indexes to read > > in file "sample.h5" > > FQ_Variable::getValuesArray the nElements size is 1643204076 > > Warning -- fileManager::storage::ctor unable to find 6,572,816,304 > > bytes of space in memory > > terminate called after throwing an instance of 'ibis::bad_alloc' > > what(): storage::ctor(memory) failed > > Aborted > > > > My configuration file is attached. Is there something that I am > > not doing incorrectly? > > > > Thanks, > > Darryl > > > > > > On Tue, Jul 29, 2014 at 12:38 AM, K. John Wu <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, Darryl, > > > > Ahh, the usage note from that program happens to be missing the > -c > > option. You should be able to use '-c ibis.rc'. One way to know > > whether you have set maxBytes to the value you want is to > > specify '-v > > 2' on the same command line and then examine the output > > message for > > line with > > > > fileManager initialization complete -- maxBytes=dddd > > > > Hope this helps. > > > > John > > > > > > > > On 7/28/14 8:53 PM, Darryl Reeves wrote: > > > Hi John, > > > > > > I am using a program that is included with fastquery in the > > "examples" > > > directory called "buildIndex." It doesn't appear to have a > > -c option, > > > however: > > > > > > > > > > > > fastquery-0.8.2.8/examples/buildIndex -f data-file-name [-i > > > index-file-name] [-g log-file] [-n variable-name] [-p > > variable-path] [-b > > > '<binning nbins=1000 />' (default unbinned)] [-r > > (force-rebuild-index)] [-v > > > verboseness] [-m fileFormat [HDF5(default), H5PART, NETCDF, > > PNETCDF]] [-l > > > mpi_subarray_size(default=100000)] > > > It builds index for a set of variables whose dataset > > location has > > > the prefix > > > variable-path and postfix variable-name. > > > > > > Use option "-i" to specify the output file for > > storing indexes. > > > Otherwise, the indexes are written back to data file > > > "data-file-name". > > > > > > Use option "-r" to enforce rebuild and replace the > > existing index. > > > > > > Under parallel mode, use "-l" to set the subarray > > size for spitting > > > dataset. > > > > > > Use option "-b" to specify the binning option to > > build the index. > > > The available binning option is defined and provided > > by the FastBit. > > > More information can be found at > > > http://crd.lbl.gov/~kewu/fastbit/doc/indexSpec.html. > > > Binning option is suggested to be used with large > > dataset to reduce > > > the size of built index. > > > Precision option is suggested to be used when the > > query involves > > > floating point numbers. > > > > > > > > > On Mon, Jul 28, 2014 at 11:48 PM, K. John Wu <[email protected] > > <mailto:[email protected]>> wrote: > > > > > >> Hi, Darryl, > > >> > > >> Thanks for your patience. > > >> > > >> What program are using using? One of our programs or your own > > >> program? With most of the example programs we provide, > > there is an > > >> option -c, which is for you to specify the configuration file. > > >> > > >> John > > >> > > >> > > >> > > >> > > >> On 7/28/14 3:50 PM, Darryl Reeves wrote: > > >>> My apologies. I should have continued reading the > > documentation. I found > > >> a > > >>> sample configuration file here: > > >>> > > >>> > > > http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html#samplerc > > >>> > > >>> Unfortunately, even when I set the value to 1000000Mb > > >>> (fileManager.maxBytes=1000000Mb), I still get the same > > error originally > > >>> reported. > > >>> > > >>> > > >>> On Mon, Jul 28, 2014 at 1:29 PM, Darryl Reeves > > <[email protected] <mailto:[email protected]>> > > >> wrote: > > >>> > > >>>> Hi John, > > >>>> > > >>>> After reading your response on a different thread ("Out > > of memory > > >> without > > >>>> MMAP"), I'm thinking that increasing the fileManager > > cache might help me > > >>>> with this problem as well. From the online documentation, > > I was able to > > >>>> determine that I can set the cache size in a file named > > "ibis.rc" that > > >> is > > >>>> located in the same directory where I am running > > "buildIndex." What > > >> should > > >>>> the format of the configuration parameter be? > > >>>> > > >>>> I tried: > > >>>> > > >>>> fileManager.maxBytes = 5GB > > >>>> > > >>>> But this doesn't seem to make a difference. Can you > > provide an example > > >>>> configuration file? > > >>>> > > >>>> Thanks, > > >>>> Darryl > > >>>> > > >>>> > > >>>> On Fri, Jul 25, 2014 at 7:02 PM, Darryl Reeves > > <[email protected] <mailto:[email protected]>> > > >> wrote: > > >>>> > > >>>>> Hi John, > > >>>>> > > >>>>> I just shared the file with you through my Google Drive > > account. Let me > > >>>>> know if you have any problems accessing the file. > > >>>>> > > >>>>> The command that I have tried to run is: > > >>>>> > > >>>>> fastquery-0.8.2.8/examples/buildIndex -f sample.2.h5 -n k > > >>>>> > > >>>>> Thanks, > > >>>>> Darryl > > >>>>> > > >>>>> > > >>>>> On Thu, Jul 24, 2014 at 10:59 PM, John Wu <[email protected] > > <mailto:[email protected]>> wrote: > > >>>>> > > >>>>>> Hi, Darryl, > > >>>>>> > > >>>>>> Are you able to share the test data with us? It would > > be useful for > > >> us > > >>>>>> to reproduce the problem. > > >>>>>> > > >>>>>> If yes, please provide instructions on reproducing the > > problem. > > >>>>>> > > >>>>>> Thanks. > > >>>>>> > > >>>>>> K Wu > > >>>>>> On Jul 24, 2014 1:25 AM, "Darryl Reeves" > > <[email protected] <mailto:[email protected]>> wrote: > > >>>>>> > > >>>>>>> Hello, > > >>>>>>> > > >>>>>>> I am attempting to index a file consisting of unsigned > > integer data > > >>>>>>> using the example program "buildIndex" included with > > fastquery. This > > >>>>>>> program works fine for the test data included with the > > source code. > > >>>>>>> > > >>>>>>> However, when I try to index my data, I receive the > > following memory > > >>>>>>> allocation error: > > >>>>>>> > > >>>>>>> Warning -- fileManager::storage::ctor unable to find > > 6,572,816,304 > > >>>>>>> bytes of space in memory > > >>>>>>> terminate called after throwing an instance of > > 'ibis::bad_alloc' > > >>>>>>> what(): storage::ctor(memory) failed > > >>>>>>> > > >>>>>>> This error is revealed very early in the execution of > > the program. By > > >>>>>>> monitoring the memory being used, I can tell that only > > 58756 Kb have > > >> been > > >>>>>>> allocated by the program. The server where this is > > running has no > > >> shortage > > >>>>>>> of RAM available. I am willing to share the dataset > > that I am using > > >> but it > > >>>>>>> is 3.2 Gb, so it cannot be transferred as part of this > > message. > > >>>>>>> > > >>>>>>> Any help that can you can provide to figure out this > > issue is > > >>>>>>> appreciated. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Darryl Reeves > > >>>>>>> Ph.D. Candidate > > >>>>>>> Mason Lab > > >>>>>>> Weill Cornell Medical College of Cornell University > > >>>>>>> Tri-Institutional Program in Computational Biology and > > Medicine > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> FastBit-users mailing list > > >>>>>>> [email protected] > > <mailto:[email protected]> > > >>>>>>> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > >>>>>>> > > >>>>>>> > > >>>>>> _______________________________________________ > > >>>>>> FastBit-users mailing list > > >>>>>> [email protected] > > <mailto:[email protected]> > > >>>>>> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > >>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Darryl Reeves > > >>>>> Ph.D. Candidate > > >>>>> Mason Lab > > >>>>> Weill Cornell Medical College of Cornell University > > >>>>> Tri-Institutional Program in Computational Biology and > > Medicine > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Darryl Reeves > > >>>> Ph.D. Candidate > > >>>> Mason Lab > > >>>> Weill Cornell Medical College of Cornell University > > >>>> Tri-Institutional Program in Computational Biology and > > Medicine > > >>>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> FastBit-users mailing list > > >>> [email protected] > > <mailto:[email protected]> > > >>> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > >>> > > >> _______________________________________________ > > >> FastBit-users mailing list > > >> [email protected] > > <mailto:[email protected]> > > >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > >> > > > > > > > > > > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] > > <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] <mailto: > [email protected]> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > > > > -- > > Darryl Reeves > > Ph.D. Candidate > > Mason Lab > > Weill Cornell Medical College of Cornell University > > Tri-Institutional Program in Computational Biology and Medicine > > > > > > > > > > -- > > Darryl Reeves > > Ph.D. Candidate > > Mason Lab > > Weill Cornell Medical College of Cornell University > > Tri-Institutional Program in Computational Biology and Medicine > > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.22 (Darwin) > Comment: GPGTools - https://gpgtools.org > > iF4EAREKAAYFAlPqpbYACgkQ4I69U3+CTfyS/wD/asIHx7idPxicmCJkgRVOWliM > hQC56fBLclp5a772sIEA/R4FqnESthm1xTM0WCbVhMotPC+M5VTeJL0eBhmiwJCk > =KSmc > -----END PGP SIGNATURE----- > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > -- Darryl Reeves Ph.D. Candidate Mason Lab Weill Cornell Medical College of Cornell University Tri-Institutional Program in Computational Biology and Medicine
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
