Hi, Andrew,

I believe that we have got to the bottom of the problem that causes the crashes you've experienced. A couple of formulas in adaptive2DBins were just a bit off so that cause occasional array-out-of-bound problem, which depends on the memory allocation scheme used by the run-time support system may cause memory access violation or segmentation fault. These formulas should be correct now. Please give the update source code a try when you get the chance <https://codeforge.lbl.gov/snapshots.php?group_id=44>. Since the nightly snapshot only gets built once a day (at 3AM), you will have to wait till tomorrow to get your hand on the it. If you want it sooner, please let us know.

I am also attaching a slightly edited get2DDistribution.cpp. The most important change is a call to ibis::util::clean at the end to clean up the data partitions stored in tlist. Without this statement, these data partitions are freed after the FastBit file manager is freed which cause the destructor of the file manager to complain about objects remain in memory. Anyway, it is a minor annoyance, but was the main reason I said I suspected some memory leaks yesterday. Valgrind has confirmed that this code completes without anything left in memory now.

Again, thanks for the sample program and data. They are very helpful in tracking down the problems. Please feel free to contact us if you have any questions or suggestions.

John

PS: If you got the chance to try out the group-by function of ibis::table class, please let us know if it work with categorical values correctly.


On 7/30/2009 8:51 AM, Andrew Olson wrote:
Hi John,

I've attached my code and a sample data set. My makefile compiles as follows:

g++ -Wall -c -g get2DDistribution.cpp
g++ -Wall -g -lfastbit get2DDistribution.o -o get2DDist

I get the segmentation fault with these command lines (and others):
./get2DDist -d test -b 20 -c1 start -c2 end -w "seqid=1 and strand='+'"
./get2DDist -d test -b 20 -c1 start -c2 end -w "seqid=3 and strand='+'"

Oddly it works with these options for -w
"seqid=2 and strand='+'"
"seqid IN (1,3) and strand='+'"






On Jul 29, 2009, at 6:58 PM, K. John Wu wrote:

Hi, Andrew,

Thanks for reporting the problem.  We have a group of users making
heavy use of the histograming functions, so we are very keen on fixing
this problem.  To do this, we would need some additional information
from you.

There are four different version of get2DDistribution in ibis::part
class, can you tell us which version you are using?  It would be
helpful to also have a log file showing where the problem might be.
If you have a small test example with some sample data that you can
share with us, it would further make the debugging process a lot easier.

John

PS: To make FastBit code print more information about its operations,
you can call ibis::init with an integer of moderate size, e.g.,

ibis::init(5);

which set the verboseness level to 5.  Normally verboseness level is
set to 0, which only print errors and some warnings.


On 7/29/2009 1:13 PM, Andrew Olson wrote:
Hi John,
I am using ibis::part::get2DDistribution() to calculate conditional 2D
histograms with adaptive bins.
My program crashes whenever the constraints argument includes more
than one type of column.  (e.g., UINT and CATEGORY)

Andrew

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users


------------------------------------------------------------------------

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
// $Id$
/// @author Andrew Olson <olson at cshl.edu>
/// @file
/// A simple test program for adaptive 2D histogram function.
/// 
#if defined(_WIN32) && defined(_MSC_VER)
#pragma warning(disable:4786)   // some identifier longer than 256 characters
#endif
#include <ibis.h>
#include <set>          // std::set
#include <iomanip>      // std::setprecision

// printout the usage string
static void usage(const char* name) {
    std::cout << "usage:\n" << name
              << "[-d directory_containing_a_dataset] "
              << "[-b number of bins] "
              << "[-c1 column 1] "
              << "[-c2 column 2] "
              << "[-v verboseness] "
              << "[-w where-clause]\n"
              << std::endl;
} // usage

// function to parse the command line arguments
static void parse_args(int argc, char** argv, ibis::partList& tlist,
                       int* nbins, const char*& col1, const char*& col2,
                       const char*& qcnd) {
#if defined(DEBUG) || defined(_DEBUG)
#if DEBUG + 0 > 10 || _DEBUG + 0 > 10
    ibis::gVerbose = INT_MAX;
#elif DEBUG + 0 > 0
    ibis::gVerbose += 7 * DEBUG;
#elif _DEBUG + 0 > 0
    ibis::gVerbose += 5 * _DEBUG;
#else
    ibis::gVerbose += 3;
#endif
#endif
    std::vector<const char*> dirs;

    for (int i=1; i<argc; ++i) {
        if (*argv[i] == '-') { // normal arguments starting with -
            switch (argv[i][1]) {
            default:
            case 'h':
            case 'H':
                usage(*argv);
                exit(0);
            case 'd':
            case 'D':
                if (i+1 < argc) {
                    ++ i;
                    dirs.push_back(argv[i]);
                }
                break;
            case 'b':
                if (i+1 < argc) {
                    ++ i;
                    *nbins = atoi(argv[i]);
                }
                break;
            case 'c':
                if (i+1 < argc) {
                    if(argv[i][2] == '1') {
                        ++ i;
                        col1 = argv[i];
                    } else {
                        ++ i;
                        col2 = argv[i];
                    }
                }
                break;
            case 'q':
            case 'Q':
            case 'w':
            case 'W':
                if (i+1 < argc) {
                    ++ i;
                    qcnd = argv[i];
                }
                break;
            case 'v':
            case 'V': { // -v=d or -v d
                char *ptr = strchr(argv[i], '=');
                if (ptr == 0) {
                    if (i+1 < argc) {
                        if (isdigit(*argv[i+1])) {
                            ibis::gVerbose += atoi(argv[i+1]);
                            i = i + 1;
                        }
                        else {
                            ++ ibis::gVerbose;
                        }
                    }
                    else {
                        ++ ibis::gVerbose;
                    }
                }
                else {
                    ibis::gVerbose += atoi(++ptr);
                }
                break;}
            } // switch (argv[i][1])
        } // normal arguments
    } // for (inti=1; ...)

    // add data partitions from explicitly specified directories
    for (std::vector<const char*>::const_iterator it = dirs.begin();
         it != dirs.end(); ++ it) {
        ibis::util::tablesFromDir(tlist, *it);
    }
} // parse_args

static void print2DDist(const ibis::part& tbl,
                        const char *col1, const char *col2,
                        const uint32_t nbins, const char *cond) {
    std::vector<double> bds1, bds2;
    std::vector<uint32_t> cnts;
    long ierr;
    if (cond == 0 || *cond == 0)
        ierr = tbl.get2DDistribution(col1, col2, nbins, nbins, bds1, bds2, 
cnts);
    else
        ierr = tbl.get2DDistribution(cond, col1, col2, nbins, nbins, bds1, 
bds2, cnts);

    if (ierr > 0 && static_cast<uint32_t>(ierr) == cnts.size()) {
        // success
        ibis::util::logger lg;
        const uint32_t nbin2 = bds2.size() - 1;
        lg.buffer() << "\n2D-Joint distribution of " << col1 << " and " << col2
                    << " from table " << tbl.name();
        if (cond && *cond)
            lg.buffer() << " subject to the condition " << cond;
        lg.buffer() << " with " << cnts.size() << " bin"
                    << (cnts.size() > 1 ? "s" : "") << " on " << bds1.size()-1
                    << " x " << bds2.size()-1 << " cells\n";
    
        uint32_t cnt = 0, tot=0;
        for (uint32_t i = 0; i < cnts.size(); ++ i) {
            if (cnts[i] > 0) {
                uint32_t i1 = i / nbin2;
                uint32_t i2 = i % nbin2;
                double area = bds1[i1+1]-bds1[i1] + bds2[i2+1]-bds2[i2];
                double weight = cnts[i]/area;
                lg.buffer() << i << "\t[" << bds1[i1] << ", " << bds1[i1+1]
                            << ") [" << bds2[i2] << ", " << bds2[i2+1]
                            << ")\t" << cnts[i] << ", " << weight << "\n";
                tot += cnts[i];
                ++ cnt;
            }
        }
        lg.buffer() << "  Number of occupied cells = " << cnt
                    << ", total count = " << tot << ", number of rows in "
                    << tbl.name() << " = " << tbl.nRows() << "\n";
    }
    else {
        // error
        ibis::util::logger lg;
        lg.buffer() << "Warning -- part[" << tbl.name()
                    << "].get2DDistribution returned with ierr = " << ierr
                    << ", bds1.size() = " << bds1.size() << ", bds2.size() = "
                    << bds2.size() << ", cnts.size() = " << cnts.size();
    }
}

int main(int argc, char** argv) {
    ibis::partList tlist;

    const char* qcnd;
    const char* col1;
    const char* col2;
    int nbins=25;

    parse_args(argc, argv, tlist, &nbins, col1, col2, qcnd);

    for (ibis::partList::const_iterator tit = tlist.begin();
         tit != tlist.end(); ++ tit) {
        print2DDist(**tit, col1, col2, nbins, qcnd);
    }
    ibis::util::clean(tlist); // clean up the data partitions

    return 0;
} // main
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to