-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi, Darryl,

FastQuery code works by breaking your data set (in the HDF5 sense of
the word) into smaller subsets that are smaller than 2^31
rows/elements each.  This is a form of partitioning that is commonly
used in most data processing software.  For example, each MPI process
might work on a partition of a large data set.  Hope this makes sense
to you.

John




On 7/29/14 11:01 AM, Darryl Reeves wrote:
> Hi John,
> 
> I found the issue in the source code for "buildIndex.cpp." It appears
> that the maxBytes argument is hard-coded in this program:
> 
>     ibis::gParameters().add("fileManager.maxBytes", "2GB"); (line 116,
>     buildIndex.cpp)
> 
> 
> I increased maxBytes to "100GB":
> 
>     ibis::gParameters().add("fileManager.maxBytes", "100GB");
> 
> 
> This change seems to have allowed the program to proceed without the
> error that I encountered previously. However, the amount of memory
> consumed by the program to build the index is very large (currently at
> 203 GB of RAM and growing).
> 
> The bigger issue is that the file that I submitted previously is only
> a sample of the data that I am attempting to index. The data that I
> want to index has a value range from 0 to 2^32-1 (the max value of an
> unsigned integer). When I try to index my data from the full dataset,
> I am encountering a new error:
> 
>     /terminate called after throwing an instance of 'char const*'/
> 
> 
> Using gdb, I tracked this exception to a line (1469) in "array_t.cpp."
> This line is part of the array_t::resize function:
> 
>     *array_t.cpp*
> 
>     template<class T>
>     void ibis::array_t<T>::resize(size_t n) {
>         if (n > 0x7FFFFFFFU) {
>             throw "array_t must have less than 2^31 elements";
>         }
> 
> 
> 
> There seems to be a requirement that array_t objects must have less
> than 2^31 elements. Some of the data in my full dataset violates this
> restriction.
> 
> This leads me to two questions which will determine if I can use
> FastBit/FastQuery in my project:
> 
> 1) I'm concerned about the use of RAM for building the index. I see
> mention in the source code of using a memory map within the
> ibis::fileManager class. Can I force FastQuery's IndexBuilder class to
> use this memory map functionality to reduce the amount of RAM used
> when building an index? If so, how can I do this? Alternatively, is
> there some other method for reducing the amount of memory needed to
> build a large index.
> 
> 2) Is there a practical reason why array_t objects cannot contain more
> than 2^31 elements?
> 
> Thank you for your help.
> 
> Best,
> Darryl
> 
> 
> 
> 
> On Tue, Jul 29, 2014 at 12:15 PM, Darryl Reeves <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi John,
> 
>     Thank you for the guidance. Unfortunately, it seems as though the
>     program is not recognizing my value for maxBytes (. The
>     configuration file appears as though it is being read but the
>     maxBytes value is not the same as the one that I have provided.
>     (Seems to still be the default value):
> 
>     darryl@hippocampus data $
>     ~/downloads/fastquery-0.8.2.8/examples/buildIndex -c ibis.rc -f
>     sample.h5 -n k -v 2
> 
>     FastBit ibis1.3.8
>     Log started on Tue Jul 29 12:07:28 2014
>     resource::read -- parsing configuration file "ibis.rc"
>     /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex data
>     file "sample.h5"      with 1 variable name ...
>     fileManager initialization complete -- maxBytes=2147485648
>     <tel:2147485648>, maxOpenFiles=768
>     FastQuery constructor invoked with datafileName=sample.h5,
>     fileFormat=0, readOnly=0
>     /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex
>     initiate the IndexBuilder object for file "sample.h5"
>     Warning -- HDF5::__getDatasetId(/sample/k.bitmapKeys): dataset
>     does not exist
>     Warning -- HDF5::__getDatasetDimension(/sample/k.bitmapKeys):
>     failed to open the dataset
>     Warning -- HDF5::getBitmapKeyLength(/sample/k): failed to get
>     bitmap keys length
>     FQ_IndexUnbinned[/sample/k]::readOld: no existing indexes to read
>     in file "sample.h5"
>     FQ_Variable::getValuesArray the nElements size is 1643204076
>     Warning -- fileManager::storage::ctor unable to find 6,572,816,304
>     bytes of space in memory
>     terminate called after throwing an instance of 'ibis::bad_alloc'
>       what():  storage::ctor(memory) failed
>     Aborted
> 
>     My configuration file is attached. Is there something that I am
>     not doing incorrectly?
> 
>     Thanks,
>     Darryl
> 
> 
>     On Tue, Jul 29, 2014 at 12:38 AM, K. John Wu <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         Hi, Darryl,
> 
>         Ahh, the usage note from that program happens to be missing the -c
>         option.  You should be able to use '-c ibis.rc'.  One way to know
>         whether you have set maxBytes to the value you want is to
>         specify '-v
>         2' on the same command line and then examine the output
>         message for
>         line with
> 
>         fileManager initialization complete -- maxBytes=dddd
> 
>         Hope this helps.
> 
>         John
> 
> 
> 
>         On 7/28/14 8:53 PM, Darryl Reeves wrote:
>         > Hi John,
>         >
>         > I am using a program that is included with fastquery in the
>         "examples"
>         > directory called "buildIndex." It doesn't appear to have a
>         -c option,
>         > however:
>         >
>         >
>         >
>         > fastquery-0.8.2.8/examples/buildIndex -f data-file-name [-i
>         > index-file-name] [-g log-file] [-n variable-name] [-p
>         variable-path] [-b
>         > '<binning nbins=1000 />' (default unbinned)] [-r
>         (force-rebuild-index)] [-v
>         > verboseness] [-m fileFormat [HDF5(default), H5PART, NETCDF,
>         PNETCDF]] [-l
>         > mpi_subarray_size(default=100000)]
>         >         It builds index for a set of variables whose dataset
>         location has
>         > the prefix
>         >         variable-path and postfix variable-name.
>         >
>         >         Use option "-i" to specify the output file for
>         storing indexes.
>         >         Otherwise, the indexes are written back to data file
>         > "data-file-name".
>         >
>         >         Use option "-r" to enforce rebuild and replace the
>         existing index.
>         >
>         >         Under parallel mode, use "-l" to set the subarray
>         size for spitting
>         > dataset.
>         >
>         >         Use option "-b" to specify the binning option to
>         build the index.
>         >         The available binning option is defined and provided
>         by the FastBit.
>         >         More information can be found at
>         > http://crd.lbl.gov/~kewu/fastbit/doc/indexSpec.html.
>         >         Binning option is suggested to be used with large
>         dataset to reduce
>         > the size of built index.
>         >         Precision option is suggested to be used when the
>         query involves
>         > floating point numbers.
>         >
>         >
>         > On Mon, Jul 28, 2014 at 11:48 PM, K. John Wu <[email protected]
>         <mailto:[email protected]>> wrote:
>         >
>         >> Hi, Darryl,
>         >>
>         >> Thanks for your patience.
>         >>
>         >> What program are using using?  One of our programs or your own
>         >> program?  With most of the example programs we provide,
>         there is an
>         >> option -c, which is for you to specify the configuration file.
>         >>
>         >> John
>         >>
>         >>
>         >>
>         >>
>         >> On 7/28/14 3:50 PM, Darryl Reeves wrote:
>         >>> My apologies. I should have continued reading the
>         documentation. I found
>         >> a
>         >>> sample configuration file here:
>         >>>
>         >>>
>         http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html#samplerc
>         >>>
>         >>> Unfortunately, even when I set the value to 1000000Mb
>         >>> (fileManager.maxBytes=1000000Mb), I still get the same
>         error originally
>         >>> reported.
>         >>>
>         >>>
>         >>> On Mon, Jul 28, 2014 at 1:29 PM, Darryl Reeves
>         <[email protected] <mailto:[email protected]>>
>         >> wrote:
>         >>>
>         >>>> Hi John,
>         >>>>
>         >>>> After reading your response on a different thread ("Out
>         of memory
>         >> without
>         >>>> MMAP"), I'm thinking that increasing the fileManager
>         cache might help me
>         >>>> with this problem as well. From the online documentation,
>         I was able to
>         >>>> determine that I can set the cache size in a file named
>         "ibis.rc" that
>         >> is
>         >>>> located in the same directory where I am running
>         "buildIndex." What
>         >> should
>         >>>> the format of the configuration parameter be?
>         >>>>
>         >>>> I tried:
>         >>>>
>         >>>> fileManager.maxBytes = 5GB
>         >>>>
>         >>>> But this doesn't seem to make a difference. Can you
>         provide an example
>         >>>> configuration file?
>         >>>>
>         >>>> Thanks,
>         >>>> Darryl
>         >>>>
>         >>>>
>         >>>> On Fri, Jul 25, 2014 at 7:02 PM, Darryl Reeves
>         <[email protected] <mailto:[email protected]>>
>         >> wrote:
>         >>>>
>         >>>>> Hi John,
>         >>>>>
>         >>>>> I just shared the file with you through my Google Drive
>         account. Let me
>         >>>>> know if you have any problems accessing the file.
>         >>>>>
>         >>>>> The command that I have tried to run is:
>         >>>>>
>         >>>>> fastquery-0.8.2.8/examples/buildIndex -f sample.2.h5 -n k
>         >>>>>
>         >>>>> Thanks,
>         >>>>> Darryl
>         >>>>>
>         >>>>>
>         >>>>> On Thu, Jul 24, 2014 at 10:59 PM, John Wu <[email protected]
>         <mailto:[email protected]>> wrote:
>         >>>>>
>         >>>>>> Hi, Darryl,
>         >>>>>>
>         >>>>>> Are you able to share the test data with us?  It would
>         be useful for
>         >> us
>         >>>>>> to reproduce the problem.
>         >>>>>>
>         >>>>>> If yes, please provide instructions on reproducing the
>         problem.
>         >>>>>>
>         >>>>>> Thanks.
>         >>>>>>
>         >>>>>> K Wu
>         >>>>>> On Jul 24, 2014 1:25 AM, "Darryl Reeves"
>         <[email protected] <mailto:[email protected]>> wrote:
>         >>>>>>
>         >>>>>>> Hello,
>         >>>>>>>
>         >>>>>>> I am attempting to index a file consisting of unsigned
>         integer data
>         >>>>>>> using the example program "buildIndex" included with
>         fastquery. This
>         >>>>>>> program works fine for the test data included with the
>         source code.
>         >>>>>>>
>         >>>>>>> However, when I try to index my data, I receive the
>         following memory
>         >>>>>>> allocation error:
>         >>>>>>>
>         >>>>>>> Warning -- fileManager::storage::ctor unable to find
>         6,572,816,304
>         >>>>>>> bytes of space in memory
>         >>>>>>> terminate called after throwing an instance of
>         'ibis::bad_alloc'
>         >>>>>>>   what():  storage::ctor(memory) failed
>         >>>>>>>
>         >>>>>>> This error is revealed very early in the execution of
>         the program. By
>         >>>>>>> monitoring the memory being used, I can tell that only
>         58756 Kb have
>         >> been
>         >>>>>>> allocated by the program. The server where this is
>         running has no
>         >> shortage
>         >>>>>>> of RAM available. I am willing to share the dataset
>         that I am using
>         >> but it
>         >>>>>>> is 3.2 Gb, so it cannot be transferred as part of this
>         message.
>         >>>>>>>
>         >>>>>>> Any help that can you can provide to figure out this
>         issue is
>         >>>>>>> appreciated.
>         >>>>>>>
>         >>>>>>> Thanks,
>         >>>>>>>
>         >>>>>>> --
>         >>>>>>> Darryl Reeves
>         >>>>>>> Ph.D. Candidate
>         >>>>>>> Mason Lab
>         >>>>>>> Weill Cornell Medical College of Cornell University
>         >>>>>>> Tri-Institutional Program in Computational Biology and
>         Medicine
>         >>>>>>>
>         >>>>>>> _______________________________________________
>         >>>>>>> FastBit-users mailing list
>         >>>>>>> [email protected]
>         <mailto:[email protected]>
>         >>>>>>>
>         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>         >>>>>>>
>         >>>>>>>
>         >>>>>> _______________________________________________
>         >>>>>> FastBit-users mailing list
>         >>>>>> [email protected]
>         <mailto:[email protected]>
>         >>>>>>
>         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>         >>>>>>
>         >>>>>>
>         >>>>>
>         >>>>>
>         >>>>> --
>         >>>>> Darryl Reeves
>         >>>>> Ph.D. Candidate
>         >>>>> Mason Lab
>         >>>>> Weill Cornell Medical College of Cornell University
>         >>>>> Tri-Institutional Program in Computational Biology and
>         Medicine
>         >>>>>
>         >>>>
>         >>>>
>         >>>>
>         >>>> --
>         >>>> Darryl Reeves
>         >>>> Ph.D. Candidate
>         >>>> Mason Lab
>         >>>> Weill Cornell Medical College of Cornell University
>         >>>> Tri-Institutional Program in Computational Biology and
>         Medicine
>         >>>>
>         >>>
>         >>>
>         >>>
>         >>>
>         >>>
>         >>> _______________________________________________
>         >>> FastBit-users mailing list
>         >>> [email protected]
>         <mailto:[email protected]>
>         >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>         >>>
>         >> _______________________________________________
>         >> FastBit-users mailing list
>         >> [email protected]
>         <mailto:[email protected]>
>         >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>         >>
>         >
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > FastBit-users mailing list
>         > [email protected]
>         <mailto:[email protected]>
>         > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>         >
>         _______________________________________________
>         FastBit-users mailing list
>         [email protected] <mailto:[email protected]>
>         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> 
> 
>     -- 
>     Darryl Reeves
>     Ph.D. Candidate
>     Mason Lab
>     Weill Cornell Medical College of Cornell University
>     Tri-Institutional Program in Computational Biology and Medicine
> 
> 
> 
> 
> -- 
> Darryl Reeves
> Ph.D. Candidate
> Mason Lab
> Weill Cornell Medical College of Cornell University
> Tri-Institutional Program in Computational Biology and Medicine
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org

iF4EAREKAAYFAlPqpbYACgkQ4I69U3+CTfyS/wD/asIHx7idPxicmCJkgRVOWliM
hQC56fBLclp5a772sIEA/R4FqnESthm1xTM0WCbVhMotPC+M5VTeJL0eBhmiwJCk
=KSmc
-----END PGP SIGNATURE-----
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to