Hi John, did you by any chance had a time to look at the issue? We depend quite heavily on the correct behavior of this code.
Thanks, Petr On Tue, Aug 25, 2015 at 12:08 PM, Petr Velan <[email protected]> wrote: > Sorry, I forgot the reference in the first paragraph > > [1] https://hpcrdm.lbl.gov/pipermail/fastbit-users/2015-May/002068.html > > Petr > > On Tue, Aug 25, 2015 at 12:06 PM, Petr Velan <[email protected]> wrote: > >> Hi John, list, >> >> We have have encountered a new bug in libfastbit that sounded >> similar to [1] initially, but turns out to be slightly different when it >> comes to consistency. In a nutshell, the problem can be described as >> follows: >> >> - If a column is missing in the *first* analyzed partition, the query >> stops right-away. No results are returned. >> - If a column is missing in *any but the first* analyzed partition, the >> partition with the missing column is just ignored. Results are even >> returned for partitions that are analyzed *after* the partition with the >> missing column. >> >> The problem can be easily demonstrated by means of the following example. >> >> Files: >> >> ---------- 1a.txt --------- >> 1 101 >> 2 102 >> 3 103 >> ---------- 1b.txt --------- >> 104 >> 105 >> 106 >> ---------- 1c.txt --------- >> 7 107 >> 8 108 >> 9 109 >> ---------- 2a.txt --------- >> 101 >> 102 >> 103 >> ---------- 2b.txt --------- >> 4 104 >> 5 105 >> 6 106 >> ---------- 2c.txt --------- >> 7 107 >> 8 108 >> 9 109 >> ---------- bug.sh --------- >> # Create database 1. It has all columns in both the first and last >> # dataset, but misses a column in the second. >> ardea -d db1/a -m 'c1:s,c2:s' -t 1a.txt >/dev/null 2>&1 >> ardea -d db1/b -m 'c2:s' -t 1b.txt >/dev/null 2>&1 >> ardea -d db1/c -m 'c1:s,c2:s' -t 1c.txt >/dev/null 2>&1 >> >> # Create database 2. It misses a column in the first dataset, but has >> # all columns in both the second and third dataset. >> ardea -d db2/a -m 'c2:s' -t 2a.txt >/dev/null 2>&1 >> ardea -d db2/b -m 'c1:s,c2:s' -t 2b.txt >/dev/null 2>&1 >> ardea -d db2/c -m 'c1:s,c2:s' -t 2c.txt >/dev/null 2>&1 >> >> thula -d db1 -s "c1, c2" -w "1=1" >> thula -d db2 -s "c1, c2" -w "1=1" >> --------------------------- >> >> Output of first call (thula -d db1 -s "c1, c2" -w "1=1"): >> >> doQuery(1=1) evaluated on T-a produced 6 hits out of 9 records >> -- begin printing the result table -- >> Table (in memory) UVlcU (filter::sift2) consists of 2 columns and 6 rows >> c1 SHORT >> c2 SHORT >> 1, 101 >> 2, 102 >> 3, 103 >> 7, 107 >> 8, 108 >> 9, 109 >> -- end printing -- >> >> Output of second call (thula -d db2 -s "c1, c2" -w "1=1"): >> >> Error -- bord::ctor failed to locate column c1 in 3 data partitions >> doQuery(1=1) failed to produce any result >> >> Please notice that libfastbit does not stop the first call/query, while >> it does so in the second. The error message of the second call also >> indicates the problem: "failed to locate column c1 in 3 data partitions" >> (instead of just 1 partition). >> >> We suspect that the problem may be introduced by bord.cpp:450: >> >> const ibis::column* refcol = 0; >> for (unsigned i = 0; refcol == 0 && i < ref.size(); ++ i) { >> refcol = ref[0]->getColumn(var.variableName()); // 450 >> if (refcol == 0) { >> size_t nch = std::strlen(ref[i]->name()); >> if (0 == strnicmp(ref[i]->name(), vname, nch) && >> vname[nch] == '_') { >> refcol = ref[i]->getColumn(vname+nch+1); >> } >> } >> } >> >> There, 'ref[0]' should probably be 'ref[i]'. >> >> Does that make sense to you? >> >> Best regards, >> Petr >> >> >> >> >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
