Hi, Greg, Potentially, you are hitting a bug in ibis::fuzz. If you have specifically instruction column my_primary_key to use something like the following (in the file -part.txt),
index=<binning none/><encoding interval-equality/> you might want to change it to something like index=<binning none/>encoding equality/> The above instruction will force it to use a simpler index (named ibis::relic), which should have less trouble. If using the simpler index removes the problem, then we can be pretty sure the problem is with ibis::fuzz. If the problem is with ibis::fuzz, I would need a copy of your file my_primary_key and the query expression that causes the error to study the problem further. Thanks. John PS: To change the index specification, you can either directly edit the file -part.txt or use the following command line (all on one line) .../ibis -d data-directory -b "my_primary_key:<binning none/><encoding equality/>" -z On 8/16/12 1:57 AM, Greg Barker wrote: > Hi John, > > Thank you for the additional information. I still haven't been able to > figure out why sometimes my part.deactivate() call does not return the > number of rows that match the expression I pass in, but after adding > part.computeMinMax() call before calling deactivate, it seemed to > change the outcome, now I hit a seg fault calling deactivate (using > rev 538): > > #0 0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at > bitvector.h:628 > #1 ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at > bitvector.cpp:1528 > #2 0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109, > hi=110, res=...) at ixfuzz.cpp:744 > #3 0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=..., > lower=...) at ixfuzz.cpp:950 > #4 0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=..., > lower=..., upper=...) at irelic.h:50 > #5 0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0, > cmp=..., mask=..., low=...) at column.cpp:5480 > #6 0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0, > cmp=..., mask=..., low=...) at column.cpp:5701 > #7 0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=..., > mask=..., hits=...) at part.cpp:3581 > #8 0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260, > term=0xaf38e18, ht=...) at query.cpp:3683 > #9 0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260, > res=...) at query.cpp:2966 > #10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4, > conds=0xb88112c "my_primary_key in > (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"..., > msk=...) at part.cpp:4101 > #11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4, > conds=0xb88112c "my_primary_key in > (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...) > at parti.cpp:1269 > > Thanks, > Greg > > On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > > Hi, Greg, > > The mystery might be related to the lazy updating of the min/max > values. Even when the min and max values are wrong in the metadata > file, FastBit should be able to answer the queries correctly. Our > first application wanted us to use the min and max as nominal lower > and upper bounds, the actually min and max could vary significantly > from the nominal bounds. To enforce the computation of the min and > max values, please call ibis::part::computeMinMax. > > When you initialize an ibis::part with a single string argument, it is > assumed to be a directory name if it contains directory separators or > it names an existing directory. If the string does not contain a '/' > or does not name an existing directory, then it is necessary to have > the second string argument (which could be nil) to tell FastBit to use > the first argument as the directory name. > > Let me know if you have any additional questions. > > John > > > > > On 8/14/12 12:58 PM, Greg Barker wrote: > > Hi John, > > > > I've been running into a scenario where I'm not able to deactivate > > rows that exist in the data file. I noticed when it gets into this > > state, the min & max for my_primary_key in -part.txt seems to be > > incorrect. I'm having trouble coming up with a small program > that can > > reproduce the issue, but this seems to get pretty close. Before > I ran > > it, the three directories it uses existed and were empty. > > > > $ cat 7rows.csv > > 1,93.19,AAA > > 2,49.14,BBB > > 3,49.19,DDD > > 4,59.10,EEE > > 5,34.48,FFF > > 6,91.49,AAA > > 7,19.50,BBB > > > > $ cat 5rows.csv > > 1,93.19,AAA > > 2,49.14,BBB > > 3,50.41,CCC > > 4,58.59,AAA > > 5,19.53,CCC > > > > $ cat loading_error.cc > > #include <memory> > > > > #include <ibis.h> > > > > int main(int argc, char **argv) > > { > > ibis::gVerbose = 1; > > > > char existing_dir[] = "existing_dir"; > > char first_incoming_dir[] = "first_incoming_dir"; > > char second_incoming_dir[] = "second_incoming_dir"; > > > > std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create()); > > firstTable->addColumn("my_primary_key", ibis::LONG); > > firstTable->addColumn("my_double_value", ibis::DOUBLE); > > firstTable->addColumn("my_category_value", ibis::CATEGORY); > > firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ","); > > firstTable->write(first_incoming_dir, "working", NULL, NULL, > NULL); > > firstTable->clearData(); > > > > ibis::part existing_part(existing_dir, static_cast<const > char*>(0)); > > existing_part.append(first_incoming_dir); > > existing_part.commit(first_incoming_dir); > > existing_part.purgeIndexFiles(); > > existing_part.buildIndexes(); > > existing_part.emptyCache(); > > > > std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create()); > > secondTable->addColumn("my_primary_key", ibis::LONG); > > secondTable->addColumn("my_double_value", ibis::DOUBLE); > > secondTable->addColumn("my_category_value", ibis::CATEGORY); > > secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ","); > > secondTable->write(second_incoming_dir, "working", NULL, > NULL, NULL); > > secondTable->clearData(); > > > > ibis::part second_part(second_incoming_dir); > > > > int deactivatedCount = 0; > > deactivatedCount = existing_part.deactivate("my_primary_key > in (1, > > 2, 3, 4, 5)"); > > std::cout << "deactivatedCount = " << deactivatedCount << > std::endl; > > existing_part.purgeInactive(); > > > > existing_part.append(second_incoming_dir); > > existing_part.commit(second_incoming_dir); > > existing_part.purgeIndexFiles(); > > existing_part.buildIndexes(); > > existing_part.emptyCache(); > > } > > > > I end up with this in the -part.txt in existing_dir: > > > > Begin Column > > name = "my_primary_key" > > data_type = "LONG" > > minimum = 6 > > maximum = 7 > > End Column > > > > I was thinking it should have min = 1 & max = 7. > > > > Thank you, > > Greg > > > > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > Whoops my mistake, deactivate() returns the number of inactive > > rows, just like it says in the doc :) > > > > Greg > > > > > > On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker > > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Hello John, > > > > Thank you for the updated code, it appears to be working > quite > > well now for that case. I really appreciate it. > > > > Another thing I noticed while I was testing is that if you > > call deactivate() multiple times before purgeInactive(), the > > return value was not what I expected. Do I need to call > > purgeInactive() after each deactivate()? > > > > For example: > > > > int deactivatedCount = 0; > > deactivatedCount += existing_part.deactivate("my_primary_key > > in (1, 2)"); > > deactivatedCount += existing_part.deactivate("my_primary_key > > in (3, 4)"); > > existing_part.purgeInactive(); > > std::cout << "deactivatedCount = " << deactivatedCount > << "\n"; > > > > Which yields: > > > > part[existing_dir]::deactivate marked 2 rows as inactive, > > leaving 3 active rows out of 5 > > part[existing_dir]::deactivate marked 2 rows as inactive, > > leaving 1 active row out of 5 > > part[existing_dir]::purgeInactive to remove 4 out of 5 rows > > deactivatedCount = 6 > > > > Thanks again for your work, > > > > Greg > > > > > > On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Hi, Greg, > > > > Thanks for the test case and test code. The problem > > should be fix > > with SVN Revision 538. Please give it a try when > you get > > the chance. > > > > There is a one minor change to your test program in > order > > to it to do > > what you want. The following line, > > > > ibis::part existing_part(existing_dir); > > > > needs to be changed to > > > > ibis::part existing_part(existing_dir, > > static_cast<const char*>(0)); > > > > The version you used will create two directories > hidden in > > .ibis, > > which are probably not what you want. > > > > John > > > > > > > > On 8/13/12 1:57 AM, Greg Barker wrote: > > > Hello, > > > > > > The type of my_primary_key is a long. I was able to > > reproduce the > > > error without the join, I also noticed that it > does not > > hit the seg > > > fault if the category column is omitted. The following > > program will > > > hit the error. > > > > > > $ cat first_data_file.csv > > > 1,93.19,AAA > > > 2,49.14,BBB > > > 3,50.41,CCC > > > 4,58.59,AAA > > > 5,19.53,CCC > > > > > > $ cat second_data_file.csv > > > 3,49.19,DDD > > > 4,59.10,EEE > > > 5,34.48,FFF > > > 6,91.49,AAA > > > 7,19.50,BBB > > > > > > $ cat loading_error.cc > > > #include <memory> > > > > > > #include <ibis.h> > > > > > > int main(int argc, char **argv) > > > { > > > char existing_dir[] = "existing_dir"; > > > char first_incoming_dir[] = "first_incoming_dir"; > > > char second_incoming_dir[] = > "second_incoming_dir"; > > > > > > std::auto_ptr<ibis::tablex> > > firstTable(ibis::tablex::create()); > > > firstTable->addColumn("my_primary_key", > ibis::LONG); > > > firstTable->addColumn("my_double_value", > ibis::DOUBLE); > > > firstTable->addColumn("my_category_value", > > ibis::CATEGORY); > > > firstTable->readCSV("first_data_file.csv", 0, > > first_incoming_dir, > > > ","); > > > firstTable->write(first_incoming_dir, "working", > > NULL, NULL, NULL); > > > firstTable->clearData(); > > > > > > ibis::part existing_part(existing_dir); > > > existing_part.append(first_incoming_dir); > > > existing_part.commit(first_incoming_dir); > > > existing_part.purgeIndexFiles(); > > > existing_part.buildIndexes(); > > > existing_part.emptyCache(); > > > > > > std::auto_ptr<ibis::tablex> > > secondTable(ibis::tablex::create()); > > > secondTable->addColumn("my_primary_key", > ibis::LONG); > > > secondTable->addColumn("my_double_value", > ibis::DOUBLE); > > > secondTable->addColumn("my_category_value", > > ibis::CATEGORY); > > > secondTable->readCSV("second_data_file.csv", 0, > > > second_incoming_dir, ","); > > > secondTable->write(second_incoming_dir, "working", > > NULL, NULL, NULL); > > > secondTable->clearData(); > > > > > > ibis::part second_part(second_incoming_dir); > > > > > > existing_part.deactivate("my_primary_key = 1"); > > > existing_part.purgeInactive(); > > > > > > existing_part.append(second_incoming_dir); > > > } > > > > > > Thank you John, > > > > > > Greg > > > > > > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>> > > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > > > > Hi, Greg, > > > > > > Thanks for the information. Looks like we might > > have neglected to > > > close some index files or somehow mishandled some > > index files. There > > > is only easy thing for us to check, this is > related > > to the handling of > > > categorical values (the columns of type > > ibis::CATEGORY). Would you > > > mind tell us if my_primary_key is an integer > column > > or a CATEGORY > > > column? > > > > > > If it is not a CATEGORY, then we might have > > something a little bit > > > more complex. We would appreciate a small > test case > > to replicate the > > > problem. > > > > > > John > > > > > > > > > On 8/10/12 5:32 PM, Greg Barker wrote: > > > > Hello - > > > > > > > > I am attempting to append some new data to some > > existing data, > > > and ran > > > > into some trouble. When loading, I join the new > > data to the existing > > > > data on a particular column, and then > deactivate & > > purgeInactive on > > > > the matching records. Then when I try to append > > the new data to the > > > > existing data, I hit a seg fault using rev > 536. If I > > > > call purgeIndexFiles before the append, it seems > > to avoid the crash, > > > > but I wasn't sure if that was recommended? > > > > > > > > My code is essentially: > > > > > > > > ibis::part existing_part("my_data"); > > > > ibis::part incoming_part("new_data"); > > > > std::auto_ptr<ibis::quaere> > > > > join(ibis::quaere::create(&existing_part, > > &incoming_part, > > > > "my_primary_key")); > > > > std::auto_ptr<ibis::table> > > rs(join->select("my_primary_key")); > > > > //then build the where clause > > > > working_part.deactivate("my_primary_key > in (3, > > 4, 5)"); > > > > working_part.purgeInactive(); > > > > working_part.append(incoming_data); > > > > > > > > > > > > Which yields the following: > > > > > > > > part[my_data]::deactivate marked 9 rows as > > inactive, leaving 10 > > > > active rows out of 19 > > > > part[my_data]::purgeInactive to remove 9 out > > of 19 rows > > > > Warning -- fileManager::flushDir can not > > remove in-memory file > > > > (my_data/my_primary_key.idx). It is in use > > > > Warning -- fileManager::flushDir(my_data) > > finished with 1 file > > > > still in memory > > > > Constructed a part named my_data > > > > filter::sift1S -- processing data partition > > my_data > > > > Segmentation fault (core dumped) > > > > > > > > Many Thanks, > > > > Greg > > > > > > > > > > > > > > > > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
