-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Hi, Darryl,
FastQuery code works by breaking your data set (in the HDF5 sense of
the word) into smaller subsets that are smaller than 2^31
rows/elements each. This is a form of partitioning that is commonly
used in most data processing software. For example, each MPI process
might work on a partition of a large data set. Hope this makes sense
to you.
John
On 7/29/14 11:01 AM, Darryl Reeves wrote:
> Hi John,
>
> I found the issue in the source code for "buildIndex.cpp." It appears
> that the maxBytes argument is hard-coded in this program:
>
> ibis::gParameters().add("fileManager.maxBytes", "2GB"); (line 116,
> buildIndex.cpp)
>
>
> I increased maxBytes to "100GB":
>
> ibis::gParameters().add("fileManager.maxBytes", "100GB");
>
>
> This change seems to have allowed the program to proceed without the
> error that I encountered previously. However, the amount of memory
> consumed by the program to build the index is very large (currently at
> 203 GB of RAM and growing).
>
> The bigger issue is that the file that I submitted previously is only
> a sample of the data that I am attempting to index. The data that I
> want to index has a value range from 0 to 2^32-1 (the max value of an
> unsigned integer). When I try to index my data from the full dataset,
> I am encountering a new error:
>
> /terminate called after throwing an instance of 'char const*'/
>
>
> Using gdb, I tracked this exception to a line (1469) in "array_t.cpp."
> This line is part of the array_t::resize function:
>
> *array_t.cpp*
>
> template<class T>
> void ibis::array_t<T>::resize(size_t n) {
> if (n > 0x7FFFFFFFU) {
> throw "array_t must have less than 2^31 elements";
> }
>
>
>
> There seems to be a requirement that array_t objects must have less
> than 2^31 elements. Some of the data in my full dataset violates this
> restriction.
>
> This leads me to two questions which will determine if I can use
> FastBit/FastQuery in my project:
>
> 1) I'm concerned about the use of RAM for building the index. I see
> mention in the source code of using a memory map within the
> ibis::fileManager class. Can I force FastQuery's IndexBuilder class to
> use this memory map functionality to reduce the amount of RAM used
> when building an index? If so, how can I do this? Alternatively, is
> there some other method for reducing the amount of memory needed to
> build a large index.
>
> 2) Is there a practical reason why array_t objects cannot contain more
> than 2^31 elements?
>
> Thank you for your help.
>
> Best,
> Darryl
>
>
>
>
> On Tue, Jul 29, 2014 at 12:15 PM, Darryl Reeves <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi John,
>
> Thank you for the guidance. Unfortunately, it seems as though the
> program is not recognizing my value for maxBytes (. The
> configuration file appears as though it is being read but the
> maxBytes value is not the same as the one that I have provided.
> (Seems to still be the default value):
>
> darryl@hippocampus data $
> ~/downloads/fastquery-0.8.2.8/examples/buildIndex -c ibis.rc -f
> sample.h5 -n k -v 2
>
> FastBit ibis1.3.8
> Log started on Tue Jul 29 12:07:28 2014
> resource::read -- parsing configuration file "ibis.rc"
> /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex data
> file "sample.h5" with 1 variable name ...
> fileManager initialization complete -- maxBytes=2147485648
> <tel:2147485648>, maxOpenFiles=768
> FastQuery constructor invoked with datafileName=sample.h5,
> fileFormat=0, readOnly=0
> /home/darryl/downloads/fastquery-0.8.2.8/examples/buildIndex
> initiate the IndexBuilder object for file "sample.h5"
> Warning -- HDF5::__getDatasetId(/sample/k.bitmapKeys): dataset
> does not exist
> Warning -- HDF5::__getDatasetDimension(/sample/k.bitmapKeys):
> failed to open the dataset
> Warning -- HDF5::getBitmapKeyLength(/sample/k): failed to get
> bitmap keys length
> FQ_IndexUnbinned[/sample/k]::readOld: no existing indexes to read
> in file "sample.h5"
> FQ_Variable::getValuesArray the nElements size is 1643204076
> Warning -- fileManager::storage::ctor unable to find 6,572,816,304
> bytes of space in memory
> terminate called after throwing an instance of 'ibis::bad_alloc'
> what(): storage::ctor(memory) failed
> Aborted
>
> My configuration file is attached. Is there something that I am
> not doing incorrectly?
>
> Thanks,
> Darryl
>
>
> On Tue, Jul 29, 2014 at 12:38 AM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi, Darryl,
>
> Ahh, the usage note from that program happens to be missing the -c
> option. You should be able to use '-c ibis.rc'. One way to know
> whether you have set maxBytes to the value you want is to
> specify '-v
> 2' on the same command line and then examine the output
> message for
> line with
>
> fileManager initialization complete -- maxBytes=dddd
>
> Hope this helps.
>
> John
>
>
>
> On 7/28/14 8:53 PM, Darryl Reeves wrote:
> > Hi John,
> >
> > I am using a program that is included with fastquery in the
> "examples"
> > directory called "buildIndex." It doesn't appear to have a
> -c option,
> > however:
> >
> >
> >
> > fastquery-0.8.2.8/examples/buildIndex -f data-file-name [-i
> > index-file-name] [-g log-file] [-n variable-name] [-p
> variable-path] [-b
> > '<binning nbins=1000 />' (default unbinned)] [-r
> (force-rebuild-index)] [-v
> > verboseness] [-m fileFormat [HDF5(default), H5PART, NETCDF,
> PNETCDF]] [-l
> > mpi_subarray_size(default=100000)]
> > It builds index for a set of variables whose dataset
> location has
> > the prefix
> > variable-path and postfix variable-name.
> >
> > Use option "-i" to specify the output file for
> storing indexes.
> > Otherwise, the indexes are written back to data file
> > "data-file-name".
> >
> > Use option "-r" to enforce rebuild and replace the
> existing index.
> >
> > Under parallel mode, use "-l" to set the subarray
> size for spitting
> > dataset.
> >
> > Use option "-b" to specify the binning option to
> build the index.
> > The available binning option is defined and provided
> by the FastBit.
> > More information can be found at
> > http://crd.lbl.gov/~kewu/fastbit/doc/indexSpec.html.
> > Binning option is suggested to be used with large
> dataset to reduce
> > the size of built index.
> > Precision option is suggested to be used when the
> query involves
> > floating point numbers.
> >
> >
> > On Mon, Jul 28, 2014 at 11:48 PM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> >
> >> Hi, Darryl,
> >>
> >> Thanks for your patience.
> >>
> >> What program are using using? One of our programs or your own
> >> program? With most of the example programs we provide,
> there is an
> >> option -c, which is for you to specify the configuration file.
> >>
> >> John
> >>
> >>
> >>
> >>
> >> On 7/28/14 3:50 PM, Darryl Reeves wrote:
> >>> My apologies. I should have continued reading the
> documentation. I found
> >> a
> >>> sample configuration file here:
> >>>
> >>>
> http://crd-legacy.lbl.gov/~kewu/fastbit/doc/dataLoading.html#samplerc
> >>>
> >>> Unfortunately, even when I set the value to 1000000Mb
> >>> (fileManager.maxBytes=1000000Mb), I still get the same
> error originally
> >>> reported.
> >>>
> >>>
> >>> On Mon, Jul 28, 2014 at 1:29 PM, Darryl Reeves
> <[email protected] <mailto:[email protected]>>
> >> wrote:
> >>>
> >>>> Hi John,
> >>>>
> >>>> After reading your response on a different thread ("Out
> of memory
> >> without
> >>>> MMAP"), I'm thinking that increasing the fileManager
> cache might help me
> >>>> with this problem as well. From the online documentation,
> I was able to
> >>>> determine that I can set the cache size in a file named
> "ibis.rc" that
> >> is
> >>>> located in the same directory where I am running
> "buildIndex." What
> >> should
> >>>> the format of the configuration parameter be?
> >>>>
> >>>> I tried:
> >>>>
> >>>> fileManager.maxBytes = 5GB
> >>>>
> >>>> But this doesn't seem to make a difference. Can you
> provide an example
> >>>> configuration file?
> >>>>
> >>>> Thanks,
> >>>> Darryl
> >>>>
> >>>>
> >>>> On Fri, Jul 25, 2014 at 7:02 PM, Darryl Reeves
> <[email protected] <mailto:[email protected]>>
> >> wrote:
> >>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I just shared the file with you through my Google Drive
> account. Let me
> >>>>> know if you have any problems accessing the file.
> >>>>>
> >>>>> The command that I have tried to run is:
> >>>>>
> >>>>> fastquery-0.8.2.8/examples/buildIndex -f sample.2.h5 -n k
> >>>>>
> >>>>> Thanks,
> >>>>> Darryl
> >>>>>
> >>>>>
> >>>>> On Thu, Jul 24, 2014 at 10:59 PM, John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>>> Hi, Darryl,
> >>>>>>
> >>>>>> Are you able to share the test data with us? It would
> be useful for
> >> us
> >>>>>> to reproduce the problem.
> >>>>>>
> >>>>>> If yes, please provide instructions on reproducing the
> problem.
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> K Wu
> >>>>>> On Jul 24, 2014 1:25 AM, "Darryl Reeves"
> <[email protected] <mailto:[email protected]>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I am attempting to index a file consisting of unsigned
> integer data
> >>>>>>> using the example program "buildIndex" included with
> fastquery. This
> >>>>>>> program works fine for the test data included with the
> source code.
> >>>>>>>
> >>>>>>> However, when I try to index my data, I receive the
> following memory
> >>>>>>> allocation error:
> >>>>>>>
> >>>>>>> Warning -- fileManager::storage::ctor unable to find
> 6,572,816,304
> >>>>>>> bytes of space in memory
> >>>>>>> terminate called after throwing an instance of
> 'ibis::bad_alloc'
> >>>>>>> what(): storage::ctor(memory) failed
> >>>>>>>
> >>>>>>> This error is revealed very early in the execution of
> the program. By
> >>>>>>> monitoring the memory being used, I can tell that only
> 58756 Kb have
> >> been
> >>>>>>> allocated by the program. The server where this is
> running has no
> >> shortage
> >>>>>>> of RAM available. I am willing to share the dataset
> that I am using
> >> but it
> >>>>>>> is 3.2 Gb, so it cannot be transferred as part of this
> message.
> >>>>>>>
> >>>>>>> Any help that can you can provide to figure out this
> issue is
> >>>>>>> appreciated.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> --
> >>>>>>> Darryl Reeves
> >>>>>>> Ph.D. Candidate
> >>>>>>> Mason Lab
> >>>>>>> Weill Cornell Medical College of Cornell University
> >>>>>>> Tri-Institutional Program in Computational Biology and
> Medicine
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> FastBit-users mailing list
> >>>>>>> [email protected]
> <mailto:[email protected]>
> >>>>>>>
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> FastBit-users mailing list
> >>>>>> [email protected]
> <mailto:[email protected]>
> >>>>>>
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Darryl Reeves
> >>>>> Ph.D. Candidate
> >>>>> Mason Lab
> >>>>> Weill Cornell Medical College of Cornell University
> >>>>> Tri-Institutional Program in Computational Biology and
> Medicine
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Darryl Reeves
> >>>> Ph.D. Candidate
> >>>> Mason Lab
> >>>> Weill Cornell Medical College of Cornell University
> >>>> Tri-Institutional Program in Computational Biology and
> Medicine
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> FastBit-users mailing list
> >>> [email protected]
> <mailto:[email protected]>
> >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >>>
> >> _______________________________________________
> >> FastBit-users mailing list
> >> [email protected]
> <mailto:[email protected]>
> >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > FastBit-users mailing list
> > [email protected]
> <mailto:[email protected]>
> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >
> _______________________________________________
> FastBit-users mailing list
> [email protected] <mailto:[email protected]>
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>
>
>
> --
> Darryl Reeves
> Ph.D. Candidate
> Mason Lab
> Weill Cornell Medical College of Cornell University
> Tri-Institutional Program in Computational Biology and Medicine
>
>
>
>
> --
> Darryl Reeves
> Ph.D. Candidate
> Mason Lab
> Weill Cornell Medical College of Cornell University
> Tri-Institutional Program in Computational Biology and Medicine
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org
iF4EAREKAAYFAlPqpbYACgkQ4I69U3+CTfyS/wD/asIHx7idPxicmCJkgRVOWliM
hQC56fBLclp5a772sIEA/R4FqnESthm1xTM0WCbVhMotPC+M5VTeJL0eBhmiwJCk
=KSmc
-----END PGP SIGNATURE-----
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users