Hi John, list,

We have have encountered a new bug in libfastbit that sounded
similar to [1] initially, but turns out to be slightly different when it
comes to consistency. In a nutshell, the problem can be described as
follows:

- If a column is missing in the *first* analyzed partition, the query
stops right-away. No results are returned.
- If a column is missing in *any but the first* analyzed partition, the
partition with the missing column is just ignored. Results are even
returned for partitions that are analyzed *after* the partition with the
missing column.

The problem can be easily demonstrated by means of the following example.

Files:

---------- 1a.txt ---------
1 101
2 102
3 103
---------- 1b.txt ---------
104
105
106
---------- 1c.txt ---------
7 107
8 108
9 109
---------- 2a.txt ---------
101
102
103
---------- 2b.txt ---------
4 104
5 105
6 106
---------- 2c.txt ---------
7 107
8 108
9 109
---------- bug.sh ---------
# Create database 1. It has all columns in both the first and last
# dataset, but misses a column in the second.
ardea -d db1/a -m 'c1:s,c2:s' -t 1a.txt >/dev/null 2>&1
ardea -d db1/b -m 'c2:s' -t 1b.txt >/dev/null 2>&1
ardea -d db1/c -m 'c1:s,c2:s' -t 1c.txt >/dev/null 2>&1

# Create database 2. It misses a column in the first dataset, but has
# all columns in both the second and third dataset.
ardea -d db2/a -m 'c2:s' -t 2a.txt >/dev/null 2>&1
ardea -d db2/b -m 'c1:s,c2:s' -t 2b.txt >/dev/null 2>&1
ardea -d db2/c -m 'c1:s,c2:s' -t 2c.txt >/dev/null 2>&1

thula -d db1 -s "c1, c2" -w "1=1"
thula -d db2 -s "c1, c2" -w "1=1"
---------------------------

Output of first call (thula -d db1 -s "c1, c2" -w "1=1"):

doQuery(1=1) evaluated on T-a produced 6 hits out of 9 records
-- begin printing the result table --
Table (in memory) UVlcU (filter::sift2) consists of 2 columns and 6 rows
c1    SHORT
c2    SHORT
1, 101
2, 102
3, 103
7, 107
8, 108
9, 109
-- end printing --

Output of second call (thula -d db2 -s "c1, c2" -w "1=1"):

Error -- bord::ctor failed to locate column c1 in 3 data partitions
doQuery(1=1) failed to produce any result

Please notice that libfastbit does not stop the first call/query, while
it does so in the second. The error message of the second call also
indicates the problem: "failed to locate column c1 in 3 data partitions"
(instead of just 1 partition).

We suspect that the problem may be introduced by bord.cpp:450:

const ibis::column* refcol = 0;
for (unsigned i = 0; refcol == 0 && i < ref.size(); ++ i) {
    refcol = ref[0]->getColumn(var.variableName()); // 450
    if (refcol == 0) {
        size_t nch = std::strlen(ref[i]->name());
        if (0 == strnicmp(ref[i]->name(), vname, nch) &&
            vname[nch] == '_') {
            refcol = ref[i]->getColumn(vname+nch+1);
        }
    }
}

There, 'ref[0]' should probably be 'ref[i]'.

Does that make sense to you?

Best regards,
Petr
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to