Hi John,

did you by any chance had a time to look at the issue? We depend quite
heavily on the correct behavior of this code.

Thanks,
Petr

On Tue, Aug 25, 2015 at 12:08 PM, Petr Velan <[email protected]> wrote:

> Sorry, I forgot the reference in the first paragraph
>
> [1] https://hpcrdm.lbl.gov/pipermail/fastbit-users/2015-May/002068.html
>
> Petr
>
> On Tue, Aug 25, 2015 at 12:06 PM, Petr Velan <[email protected]> wrote:
>
>> Hi John, list,
>>
>> We have have encountered a new bug in libfastbit that sounded
>> similar to [1] initially, but turns out to be slightly different when it
>> comes to consistency. In a nutshell, the problem can be described as
>> follows:
>>
>> - If a column is missing in the *first* analyzed partition, the query
>> stops right-away. No results are returned.
>> - If a column is missing in *any but the first* analyzed partition, the
>> partition with the missing column is just ignored. Results are even
>> returned for partitions that are analyzed *after* the partition with the
>> missing column.
>>
>> The problem can be easily demonstrated by means of the following example.
>>
>> Files:
>>
>> ---------- 1a.txt ---------
>> 1 101
>> 2 102
>> 3 103
>> ---------- 1b.txt ---------
>> 104
>> 105
>> 106
>> ---------- 1c.txt ---------
>> 7 107
>> 8 108
>> 9 109
>> ---------- 2a.txt ---------
>> 101
>> 102
>> 103
>> ---------- 2b.txt ---------
>> 4 104
>> 5 105
>> 6 106
>> ---------- 2c.txt ---------
>> 7 107
>> 8 108
>> 9 109
>> ---------- bug.sh ---------
>> # Create database 1. It has all columns in both the first and last
>> # dataset, but misses a column in the second.
>> ardea -d db1/a -m 'c1:s,c2:s' -t 1a.txt >/dev/null 2>&1
>> ardea -d db1/b -m 'c2:s' -t 1b.txt >/dev/null 2>&1
>> ardea -d db1/c -m 'c1:s,c2:s' -t 1c.txt >/dev/null 2>&1
>>
>> # Create database 2. It misses a column in the first dataset, but has
>> # all columns in both the second and third dataset.
>> ardea -d db2/a -m 'c2:s' -t 2a.txt >/dev/null 2>&1
>> ardea -d db2/b -m 'c1:s,c2:s' -t 2b.txt >/dev/null 2>&1
>> ardea -d db2/c -m 'c1:s,c2:s' -t 2c.txt >/dev/null 2>&1
>>
>> thula -d db1 -s "c1, c2" -w "1=1"
>> thula -d db2 -s "c1, c2" -w "1=1"
>> ---------------------------
>>
>> Output of first call (thula -d db1 -s "c1, c2" -w "1=1"):
>>
>> doQuery(1=1) evaluated on T-a produced 6 hits out of 9 records
>> -- begin printing the result table --
>> Table (in memory) UVlcU (filter::sift2) consists of 2 columns and 6 rows
>> c1    SHORT
>> c2    SHORT
>> 1, 101
>> 2, 102
>> 3, 103
>> 7, 107
>> 8, 108
>> 9, 109
>> -- end printing --
>>
>> Output of second call (thula -d db2 -s "c1, c2" -w "1=1"):
>>
>> Error -- bord::ctor failed to locate column c1 in 3 data partitions
>> doQuery(1=1) failed to produce any result
>>
>> Please notice that libfastbit does not stop the first call/query, while
>> it does so in the second. The error message of the second call also
>> indicates the problem: "failed to locate column c1 in 3 data partitions"
>> (instead of just 1 partition).
>>
>> We suspect that the problem may be introduced by bord.cpp:450:
>>
>> const ibis::column* refcol = 0;
>> for (unsigned i = 0; refcol == 0 && i < ref.size(); ++ i) {
>>     refcol = ref[0]->getColumn(var.variableName()); // 450
>>     if (refcol == 0) {
>>         size_t nch = std::strlen(ref[i]->name());
>>         if (0 == strnicmp(ref[i]->name(), vname, nch) &&
>>             vname[nch] == '_') {
>>             refcol = ref[i]->getColumn(vname+nch+1);
>>         }
>>     }
>> }
>>
>> There, 'ref[0]' should probably be 'ref[i]'.
>>
>> Does that make sense to you?
>>
>> Best regards,
>> Petr
>>
>>
>>
>>
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to