Thank you for the information, John.

Best,
Darryl


On Tue, Aug 12, 2014 at 7:39 PM, K. John Wu <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Hi, Darryl,
>
> FastQuery code works by breaking your data set (in the HDF5 sense of
> the word) into smaller subsets that are smaller than 2^31
> rows/elements each.  This is a form of partitioning that is commonly
> used in most data processing software.  For example, each MPI process
> might work on a partition of a large data set.  Hope this makes sense
> to you.
>
> John
>
>
>
>
> On 7/29/14 11:01 AM, Darryl Reeves wrote:
> > Hi John,
> >
> > I found the issue in the source code for "buildIndex.cpp." It appears
> > that the maxBytes argument is hard-coded in this program:
> >
> >     ibis::gParameters().add("fileManager.maxBytes", "2GB"); (line 116,
> >     buildIndex.cpp)
> >
> >
> > I increased maxBytes to "100GB":
> >
> >     ibis::gParameters().add("fileManager.maxBytes", "100GB");
> >
> >
> > This change seems to have allowed the program to proceed without the
> > error that I encountered previously. However, the amount of memory
> > consumed by the program to build the index is very large (currently at
> > 203 GB of RAM and growing).
> >
> > The bigger issue is that the file that I submitted previously is only
> > a sample of the data that I am attempting to index. The data that I
> > want to index has a value range from 0 to 2^32-1 (the max value of an
> > unsigned integer). When I try to index my data from the full dataset,
> > I am encountering a new error:
> >
> >     /terminate called after throwing an instance of 'char const*'/
> >
> >
> > Using gdb, I tracked this exception to a line (1469) in "array_t.cpp."
> > This line is part of the array_t::resize function:
> >
> >     *array_t.cpp*
> >
> >     template<class T>
> >     void ibis::array_t<T>::resize(size_t n) {
> >         if (n > 0x7FFFFFFFU) {
> >             throw "array_t must have less than 2^31 elements";
> >         }
> >
> >
> >
> > There seems to be a requirement that array_t objects must have less
> > than 2^31 elements. Some of the data in my full dataset violates this
> > restriction.
> >
> > This leads me to two questions which will determine if I can use
> > FastBit/FastQuery in my project:
> >
> > 1) I'm concerned about the use of RAM for building the index. I see
> > mention in the source code of using a memory map within the
> > ibis::fileManager class. Can I force FastQuery's IndexBuilder class to
> > use this memory map functionality to reduce the amount of RAM used
> > when building an index? If so, how can I do this? Alternatively, is
> > there some other method for reducing the amount of memory needed to
> > build a large index.
> >
> > 2) Is there a practical reason why array_t objects cannot contain more
> > than 2^31 elements?
> >
> > Thank you for your help.
> >
> > Best,
> > Darryl
> >
> >
> >
> >
> > On Tue, Jul 29, 2014 at 12:15 PM, Darryl Reeves <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi John,
> >
> >     Thank you for the guidance. Unfortunately, it seems as though the
> >     program is not recognizing my value for maxBytes (. The
> >     configuration file appears as though it is being read but the
> >     maxBytes value is not the same as the one that I have provided.
> >     (Seems to still be the default value):
> >
> >     darryl@hippocampus data $
> >     ~/downloads/fastquery-0.8.2.8/examples/buildIndex -c ibis.rc -f
> >     sample.h5 -n k -v 2
> >
> >     FastBit ibis1.3.8
> >     Log started on Tue Jul 29 12:07:28 2014
> >     resource::read -- parsing configuration file "ibis.rc"
> >     /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex data
> >     file "sample.h5"      with 1 variable name ...
> >     fileManager initialization complete -- maxBytes=2147485648
> >     <tel:2147485648>, maxOpenFiles=768
> >     FastQuery constructor invoked with datafileName=sample.h5,
> >     fileFormat=0, readOnly=0
> >     /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex
> >     initiate the IndexBuilder object for file "sample.h5"
> >     Warning -- HDF5::__getDatasetId(/sample/k.bitmapKeys): dataset
> >     does not exist
> >     Warning -- HDF5::__getDatasetDimension(/sample/k.bitmapKeys):
> >     failed to open the dataset
> >     Warning -- HDF5::getBitmapKeyLength(/sample/k): failed to get
> >     bitmap keys length
> >     FQ_IndexUnbinned[/sample/k]::readOld: no existing indexes to read
> >     in file "sample.h5"
> >     FQ_Variable::getValuesArray the nElements size is 1643204076
> >     Warning -- fileManager::storage::ctor unable to find 6,572,816,304
> >     bytes of space in memory
> >     terminate called after throwing an instance of 'ibis::bad_alloc'
> >       what():  storage::ctor(memory) failed
> >     Aborted
> >
> >     My configuration file is attached. Is there something that I am
> >     not doing incorrectly?
> >
> >     Thanks,
> >     Darryl
> >
> >
> >     On Tue, Jul 29, 2014 at 12:38 AM, K. John Wu <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi, Darryl,
> >
> >         Ahh, the usage note from that program happens to be missing the
> -c
> >         option.  You should be able to use '-c ibis.rc'.  One way to know
> >         whether you have set maxBytes to the value you want is to
> >         specify '-v
> >         2' on the same command line and then examine the output
> >         message for
> >         line with
> >
> >         fileManager initialization complete -- maxBytes=dddd
> >
> >         Hope this helps.
> >
> >         John
> >
> >
> >
> >         On 7/28/14 8:53 PM, Darryl Reeves wrote:
> >         > Hi John,
> >         >
> >         > I am using a program that is included with fastquery in the
> >         "examples"
> >         > directory called "buildIndex." It doesn't appear to have a
> >         -c option,
> >         > however:
> >         >
> >         >
> >         >
> >         > fastquery-0.8.2.8/examples/buildIndex -f data-file-name [-i
> >         > index-file-name] [-g log-file] [-n variable-name] [-p
> >         variable-path] [-b
> >         > '<binning nbins=1000 />' (default unbinned)] [-r
> >         (force-rebuild-index)] [-v
> >         > verboseness] [-m fileFormat [HDF5(default), H5PART, NETCDF,
> >         PNETCDF]] [-l
> >         > mpi_subarray_size(default=100000)]
> >         >         It builds index for a set of variables whose dataset
> >         location has
> >         > the prefix
> >         >         variable-path and postfix variable-name.
> >         >
> >         >         Use option "-i" to specify the output file for
> >         storing indexes.
> >         >         Otherwise, the indexes are written back to data file
> >         > "data-file-name".
> >         >
> >         >         Use option "-r" to enforce rebuild and replace the
> >         existing index.
> >         >
> >         >         Under parallel mode, use "-l" to set the subarray
> >         size for spitting
> >         > dataset.
> >         >
> >         >         Use option "-b" to specify the binning option to
> >         build the index.
> >         >         The available binning option is defined and provided
> >         by the FastBit.
> >         >         More information can be found at
> >         > http://crd.lbl.gov/~kewu/fastbit/doc/indexSpec.html.
> >         >         Binning option is suggested to be used with large
> >         dataset to reduce
> >         > the size of built index.
> >         >         Precision option is suggested to be used when the
> >         query involves
> >         > floating point numbers.
> >         >
> >         >
> >         > On Mon, Jul 28, 2014 at 11:48 PM, K. John Wu <[email protected]
> >         <mailto:[email protected]>> wrote:
> >         >
> >         >> Hi, Darryl,
> >         >>
> >         >> Thanks for your patience.
> >         >>
> >         >> What program are using using?  One of our programs or your own
> >         >> program?  With most of the example programs we provide,
> >         there is an
> >         >> option -c, which is for you to specify the configuration file.
> >         >>
> >         >> John
> >         >>
> >         >>
> >         >>
> >         >>
> >         >> On 7/28/14 3:50 PM, Darryl Reeves wrote:
> >         >>> My apologies. I should have continued reading the
> >         documentation. I found
> >         >> a
> >         >>> sample configuration file here:
> >         >>>
> >         >>>
> >
> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html#samplerc
> >         >>>
> >         >>> Unfortunately, even when I set the value to 1000000Mb
> >         >>> (fileManager.maxBytes=1000000Mb), I still get the same
> >         error originally
> >         >>> reported.
> >         >>>
> >         >>>
> >         >>> On Mon, Jul 28, 2014 at 1:29 PM, Darryl Reeves
> >         <[email protected] <mailto:[email protected]>>
> >         >> wrote:
> >         >>>
> >         >>>> Hi John,
> >         >>>>
> >         >>>> After reading your response on a different thread ("Out
> >         of memory
> >         >> without
> >         >>>> MMAP"), I'm thinking that increasing the fileManager
> >         cache might help me
> >         >>>> with this problem as well. From the online documentation,
> >         I was able to
> >         >>>> determine that I can set the cache size in a file named
> >         "ibis.rc" that
> >         >> is
> >         >>>> located in the same directory where I am running
> >         "buildIndex." What
> >         >> should
> >         >>>> the format of the configuration parameter be?
> >         >>>>
> >         >>>> I tried:
> >         >>>>
> >         >>>> fileManager.maxBytes = 5GB
> >         >>>>
> >         >>>> But this doesn't seem to make a difference. Can you
> >         provide an example
> >         >>>> configuration file?
> >         >>>>
> >         >>>> Thanks,
> >         >>>> Darryl
> >         >>>>
> >         >>>>
> >         >>>> On Fri, Jul 25, 2014 at 7:02 PM, Darryl Reeves
> >         <[email protected] <mailto:[email protected]>>
> >         >> wrote:
> >         >>>>
> >         >>>>> Hi John,
> >         >>>>>
> >         >>>>> I just shared the file with you through my Google Drive
> >         account. Let me
> >         >>>>> know if you have any problems accessing the file.
> >         >>>>>
> >         >>>>> The command that I have tried to run is:
> >         >>>>>
> >         >>>>> fastquery-0.8.2.8/examples/buildIndex -f sample.2.h5 -n k
> >         >>>>>
> >         >>>>> Thanks,
> >         >>>>> Darryl
> >         >>>>>
> >         >>>>>
> >         >>>>> On Thu, Jul 24, 2014 at 10:59 PM, John Wu <[email protected]
> >         <mailto:[email protected]>> wrote:
> >         >>>>>
> >         >>>>>> Hi, Darryl,
> >         >>>>>>
> >         >>>>>> Are you able to share the test data with us?  It would
> >         be useful for
> >         >> us
> >         >>>>>> to reproduce the problem.
> >         >>>>>>
> >         >>>>>> If yes, please provide instructions on reproducing the
> >         problem.
> >         >>>>>>
> >         >>>>>> Thanks.
> >         >>>>>>
> >         >>>>>> K Wu
> >         >>>>>> On Jul 24, 2014 1:25 AM, "Darryl Reeves"
> >         <[email protected] <mailto:[email protected]>> wrote:
> >         >>>>>>
> >         >>>>>>> Hello,
> >         >>>>>>>
> >         >>>>>>> I am attempting to index a file consisting of unsigned
> >         integer data
> >         >>>>>>> using the example program "buildIndex" included with
> >         fastquery. This
> >         >>>>>>> program works fine for the test data included with the
> >         source code.
> >         >>>>>>>
> >         >>>>>>> However, when I try to index my data, I receive the
> >         following memory
> >         >>>>>>> allocation error:
> >         >>>>>>>
> >         >>>>>>> Warning -- fileManager::storage::ctor unable to find
> >         6,572,816,304
> >         >>>>>>> bytes of space in memory
> >         >>>>>>> terminate called after throwing an instance of
> >         'ibis::bad_alloc'
> >         >>>>>>>   what():  storage::ctor(memory) failed
> >         >>>>>>>
> >         >>>>>>> This error is revealed very early in the execution of
> >         the program. By
> >         >>>>>>> monitoring the memory being used, I can tell that only
> >         58756 Kb have
> >         >> been
> >         >>>>>>> allocated by the program. The server where this is
> >         running has no
> >         >> shortage
> >         >>>>>>> of RAM available. I am willing to share the dataset
> >         that I am using
> >         >> but it
> >         >>>>>>> is 3.2 Gb, so it cannot be transferred as part of this
> >         message.
> >         >>>>>>>
> >         >>>>>>> Any help that can you can provide to figure out this
> >         issue is
> >         >>>>>>> appreciated.
> >         >>>>>>>
> >         >>>>>>> Thanks,
> >         >>>>>>>
> >         >>>>>>> --
> >         >>>>>>> Darryl Reeves
> >         >>>>>>> Ph.D. Candidate
> >         >>>>>>> Mason Lab
> >         >>>>>>> Weill Cornell Medical College of Cornell University
> >         >>>>>>> Tri-Institutional Program in Computational Biology and
> >         Medicine
> >         >>>>>>>
> >         >>>>>>> _______________________________________________
> >         >>>>>>> FastBit-users mailing list
> >         >>>>>>> [email protected]
> >         <mailto:[email protected]>
> >         >>>>>>>
> >         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >         >>>>>>>
> >         >>>>>>>
> >         >>>>>> _______________________________________________
> >         >>>>>> FastBit-users mailing list
> >         >>>>>> [email protected]
> >         <mailto:[email protected]>
> >         >>>>>>
> >         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >         >>>>>>
> >         >>>>>>
> >         >>>>>
> >         >>>>>
> >         >>>>> --
> >         >>>>> Darryl Reeves
> >         >>>>> Ph.D. Candidate
> >         >>>>> Mason Lab
> >         >>>>> Weill Cornell Medical College of Cornell University
> >         >>>>> Tri-Institutional Program in Computational Biology and
> >         Medicine
> >         >>>>>
> >         >>>>
> >         >>>>
> >         >>>>
> >         >>>> --
> >         >>>> Darryl Reeves
> >         >>>> Ph.D. Candidate
> >         >>>> Mason Lab
> >         >>>> Weill Cornell Medical College of Cornell University
> >         >>>> Tri-Institutional Program in Computational Biology and
> >         Medicine
> >         >>>>
> >         >>>
> >         >>>
> >         >>>
> >         >>>
> >         >>>
> >         >>> _______________________________________________
> >         >>> FastBit-users mailing list
> >         >>> [email protected]
> >         <mailto:[email protected]>
> >         >>>
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >         >>>
> >         >> _______________________________________________
> >         >> FastBit-users mailing list
> >         >> [email protected]
> >         <mailto:[email protected]>
> >         >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >         >>
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > _______________________________________________
> >         > FastBit-users mailing list
> >         > [email protected]
> >         <mailto:[email protected]>
> >         > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >         >
> >         _______________________________________________
> >         FastBit-users mailing list
> >         [email protected] <mailto:
> [email protected]>
> >         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >
> >
> >
> >
> >     --
> >     Darryl Reeves
> >     Ph.D. Candidate
> >     Mason Lab
> >     Weill Cornell Medical College of Cornell University
> >     Tri-Institutional Program in Computational Biology and Medicine
> >
> >
> >
> >
> > --
> > Darryl Reeves
> > Ph.D. Candidate
> > Mason Lab
> > Weill Cornell Medical College of Cornell University
> > Tri-Institutional Program in Computational Biology and Medicine
> >
> >
> > _______________________________________________
> > FastBit-users mailing list
> > [email protected]
> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> Comment: GPGTools - https://gpgtools.org
>
> iF4EAREKAAYFAlPqpbYACgkQ4I69U3+CTfyS/wD/asIHx7idPxicmCJkgRVOWliM
> hQC56fBLclp5a772sIEA/R4FqnESthm1xTM0WCbVhMotPC+M5VTeJL0eBhmiwJCk
> =KSmc
> -----END PGP SIGNATURE-----
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>



-- 
Darryl Reeves
Ph.D. Candidate
Mason Lab
Weill Cornell Medical College of Cornell University
Tri-Institutional Program in Computational Biology and Medicine
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to