John, I changed the index for my_primary_key to <binning none/><encoding equality/> per your suggestion and I am no longer hitting that seg fault. That change, in combination with calling ibis::part::computeMinMax before I call ibis::part::deactivate, seems to have resolved all my issues. I will try to get a reduced test case to send to you.
Thank you for all the help you have provided, I really appreciate it. Thanks again, Greg On Thu, Aug 16, 2012 at 7:20 AM, K. John Wu <[email protected]> wrote: > Hi, Greg, > > Potentially, you are hitting a bug in ibis::fuzz. If you have > specifically instruction column my_primary_key to use something like > the following (in the file -part.txt), > > index=<binning none/><encoding interval-equality/> > > you might want to change it to something like > > index=<binning none/>encoding equality/> > > The above instruction will force it to use a simpler index (named > ibis::relic), which should have less trouble. > > If using the simpler index removes the problem, then we can be pretty > sure the problem is with ibis::fuzz. If the problem is with > ibis::fuzz, I would need a copy of your file my_primary_key and the > query expression that causes the error to study the problem further. > > Thanks. > > John > > PS: To change the index specification, you can either directly edit > the file -part.txt or use the following command line (all on one line) > > .../ibis -d data-directory -b "my_primary_key:<binning none/><encoding > equality/>" -z > > > > On 8/16/12 1:57 AM, Greg Barker wrote: > > Hi John, > > > > Thank you for the additional information. I still haven't been able to > > figure out why sometimes my part.deactivate() call does not return the > > number of rows that match the expression I pass in, but after adding > > part.computeMinMax() call before calling deactivate, it seemed to > > change the outcome, now I hit a seg fault calling deactivate (using > > rev 538): > > > > #0 0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at > > bitvector.h:628 > > #1 ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at > > bitvector.cpp:1528 > > #2 0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109, > > hi=110, res=...) at ixfuzz.cpp:744 > > #3 0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=..., > > lower=...) at ixfuzz.cpp:950 > > #4 0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=..., > > lower=..., upper=...) at irelic.h:50 > > #5 0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0, > > cmp=..., mask=..., low=...) at column.cpp:5480 > > #6 0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0, > > cmp=..., mask=..., low=...) at column.cpp:5701 > > #7 0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=..., > > mask=..., hits=...) at part.cpp:3581 > > #8 0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260, > > term=0xaf38e18, ht=...) at query.cpp:3683 > > #9 0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260, > > res=...) at query.cpp:2966 > > #10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4, > > conds=0xb88112c "my_primary_key in > > > (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"..., > > msk=...) at part.cpp:4101 > > #11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4, > > conds=0xb88112c "my_primary_key in > > > (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...) > > at parti.cpp:1269 > > > > Thanks, > > Greg > > > > On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, Greg, > > > > The mystery might be related to the lazy updating of the min/max > > values. Even when the min and max values are wrong in the metadata > > file, FastBit should be able to answer the queries correctly. Our > > first application wanted us to use the min and max as nominal lower > > and upper bounds, the actually min and max could vary significantly > > from the nominal bounds. To enforce the computation of the min and > > max values, please call ibis::part::computeMinMax. > > > > When you initialize an ibis::part with a single string argument, it > is > > assumed to be a directory name if it contains directory separators or > > it names an existing directory. If the string does not contain a '/' > > or does not name an existing directory, then it is necessary to have > > the second string argument (which could be nil) to tell FastBit to > use > > the first argument as the directory name. > > > > Let me know if you have any additional questions. > > > > John > > > > > > > > > > On 8/14/12 12:58 PM, Greg Barker wrote: > > > Hi John, > > > > > > I've been running into a scenario where I'm not able to deactivate > > > rows that exist in the data file. I noticed when it gets into this > > > state, the min & max for my_primary_key in -part.txt seems to be > > > incorrect. I'm having trouble coming up with a small program > > that can > > > reproduce the issue, but this seems to get pretty close. Before > > I ran > > > it, the three directories it uses existed and were empty. > > > > > > $ cat 7rows.csv > > > 1,93.19,AAA > > > 2,49.14,BBB > > > 3,49.19,DDD > > > 4,59.10,EEE > > > 5,34.48,FFF > > > 6,91.49,AAA > > > 7,19.50,BBB > > > > > > $ cat 5rows.csv > > > 1,93.19,AAA > > > 2,49.14,BBB > > > 3,50.41,CCC > > > 4,58.59,AAA > > > 5,19.53,CCC > > > > > > $ cat loading_error.cc > > > #include <memory> > > > > > > #include <ibis.h> > > > > > > int main(int argc, char **argv) > > > { > > > ibis::gVerbose = 1; > > > > > > char existing_dir[] = "existing_dir"; > > > char first_incoming_dir[] = "first_incoming_dir"; > > > char second_incoming_dir[] = "second_incoming_dir"; > > > > > > std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create()); > > > firstTable->addColumn("my_primary_key", ibis::LONG); > > > firstTable->addColumn("my_double_value", ibis::DOUBLE); > > > firstTable->addColumn("my_category_value", ibis::CATEGORY); > > > firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ","); > > > firstTable->write(first_incoming_dir, "working", NULL, NULL, > > NULL); > > > firstTable->clearData(); > > > > > > ibis::part existing_part(existing_dir, static_cast<const > > char*>(0)); > > > existing_part.append(first_incoming_dir); > > > existing_part.commit(first_incoming_dir); > > > existing_part.purgeIndexFiles(); > > > existing_part.buildIndexes(); > > > existing_part.emptyCache(); > > > > > > std::auto_ptr<ibis::tablex> > secondTable(ibis::tablex::create()); > > > secondTable->addColumn("my_primary_key", ibis::LONG); > > > secondTable->addColumn("my_double_value", ibis::DOUBLE); > > > secondTable->addColumn("my_category_value", ibis::CATEGORY); > > > secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ","); > > > secondTable->write(second_incoming_dir, "working", NULL, > > NULL, NULL); > > > secondTable->clearData(); > > > > > > ibis::part second_part(second_incoming_dir); > > > > > > int deactivatedCount = 0; > > > deactivatedCount = existing_part.deactivate("my_primary_key > > in (1, > > > 2, 3, 4, 5)"); > > > std::cout << "deactivatedCount = " << deactivatedCount << > > std::endl; > > > existing_part.purgeInactive(); > > > > > > existing_part.append(second_incoming_dir); > > > existing_part.commit(second_incoming_dir); > > > existing_part.purgeIndexFiles(); > > > existing_part.buildIndexes(); > > > existing_part.emptyCache(); > > > } > > > > > > I end up with this in the -part.txt in existing_dir: > > > > > > Begin Column > > > name = "my_primary_key" > > > data_type = "LONG" > > > minimum = 6 > > > maximum = 7 > > > End Column > > > > > > I was thinking it should have min = 1 & max = 7. > > > > > > Thank you, > > > Greg > > > > > > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > > > > Whoops my mistake, deactivate() returns the number of inactive > > > rows, just like it says in the doc :) > > > > > > Greg > > > > > > > > > On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker > > > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > > > Hello John, > > > > > > Thank you for the updated code, it appears to be working > > quite > > > well now for that case. I really appreciate it. > > > > > > Another thing I noticed while I was testing is that if you > > > call deactivate() multiple times before purgeInactive(), > the > > > return value was not what I expected. Do I need to call > > > purgeInactive() after each deactivate()? > > > > > > For example: > > > > > > int deactivatedCount = 0; > > > deactivatedCount += > existing_part.deactivate("my_primary_key > > > in (1, 2)"); > > > deactivatedCount += > existing_part.deactivate("my_primary_key > > > in (3, 4)"); > > > existing_part.purgeInactive(); > > > std::cout << "deactivatedCount = " << deactivatedCount > > << "\n"; > > > > > > Which yields: > > > > > > part[existing_dir]::deactivate marked 2 rows as inactive, > > > leaving 3 active rows out of 5 > > > part[existing_dir]::deactivate marked 2 rows as inactive, > > > leaving 1 active row out of 5 > > > part[existing_dir]::purgeInactive to remove 4 out of 5 rows > > > deactivatedCount = 6 > > > > > > Thanks again for your work, > > > > > > Greg > > > > > > > > > On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > Hi, Greg, > > > > > > Thanks for the test case and test code. The problem > > > should be fix > > > with SVN Revision 538. Please give it a try when > > you get > > > the chance. > > > > > > There is a one minor change to your test program in > > order > > > to it to do > > > what you want. The following line, > > > > > > ibis::part existing_part(existing_dir); > > > > > > needs to be changed to > > > > > > ibis::part existing_part(existing_dir, > > > static_cast<const char*>(0)); > > > > > > The version you used will create two directories > > hidden in > > > .ibis, > > > which are probably not what you want. > > > > > > John > > > > > > > > > > > > On 8/13/12 1:57 AM, Greg Barker wrote: > > > > Hello, > > > > > > > > The type of my_primary_key is a long. I was able to > > > reproduce the > > > > error without the join, I also noticed that it > > does not > > > hit the seg > > > > fault if the category column is omitted. The > following > > > program will > > > > hit the error. > > > > > > > > $ cat first_data_file.csv > > > > 1,93.19,AAA > > > > 2,49.14,BBB > > > > 3,50.41,CCC > > > > 4,58.59,AAA > > > > 5,19.53,CCC > > > > > > > > $ cat second_data_file.csv > > > > 3,49.19,DDD > > > > 4,59.10,EEE > > > > 5,34.48,FFF > > > > 6,91.49,AAA > > > > 7,19.50,BBB > > > > > > > > $ cat loading_error.cc > > > > #include <memory> > > > > > > > > #include <ibis.h> > > > > > > > > int main(int argc, char **argv) > > > > { > > > > char existing_dir[] = "existing_dir"; > > > > char first_incoming_dir[] = "first_incoming_dir"; > > > > char second_incoming_dir[] = > > "second_incoming_dir"; > > > > > > > > std::auto_ptr<ibis::tablex> > > > firstTable(ibis::tablex::create()); > > > > firstTable->addColumn("my_primary_key", > > ibis::LONG); > > > > firstTable->addColumn("my_double_value", > > ibis::DOUBLE); > > > > firstTable->addColumn("my_category_value", > > > ibis::CATEGORY); > > > > firstTable->readCSV("first_data_file.csv", 0, > > > first_incoming_dir, > > > > ","); > > > > firstTable->write(first_incoming_dir, "working", > > > NULL, NULL, NULL); > > > > firstTable->clearData(); > > > > > > > > ibis::part existing_part(existing_dir); > > > > existing_part.append(first_incoming_dir); > > > > existing_part.commit(first_incoming_dir); > > > > existing_part.purgeIndexFiles(); > > > > existing_part.buildIndexes(); > > > > existing_part.emptyCache(); > > > > > > > > std::auto_ptr<ibis::tablex> > > > secondTable(ibis::tablex::create()); > > > > secondTable->addColumn("my_primary_key", > > ibis::LONG); > > > > secondTable->addColumn("my_double_value", > > ibis::DOUBLE); > > > > secondTable->addColumn("my_category_value", > > > ibis::CATEGORY); > > > > secondTable->readCSV("second_data_file.csv", 0, > > > > second_incoming_dir, ","); > > > > secondTable->write(second_incoming_dir, > "working", > > > NULL, NULL, NULL); > > > > secondTable->clearData(); > > > > > > > > ibis::part second_part(second_incoming_dir); > > > > > > > > existing_part.deactivate("my_primary_key = 1"); > > > > existing_part.purgeInactive(); > > > > > > > > existing_part.append(second_incoming_dir); > > > > } > > > > > > > > Thank you John, > > > > > > > > Greg > > > > > > > > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>> > > > > <mailto:[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > > > > > > Hi, Greg, > > > > > > > > Thanks for the information. Looks like we might > > > have neglected to > > > > close some index files or somehow mishandled some > > > index files. There > > > > is only easy thing for us to check, this is > > related > > > to the handling of > > > > categorical values (the columns of type > > > ibis::CATEGORY). Would you > > > > mind tell us if my_primary_key is an integer > > column > > > or a CATEGORY > > > > column? > > > > > > > > If it is not a CATEGORY, then we might have > > > something a little bit > > > > more complex. We would appreciate a small > > test case > > > to replicate the > > > > problem. > > > > > > > > John > > > > > > > > > > > > On 8/10/12 5:32 PM, Greg Barker wrote: > > > > > Hello - > > > > > > > > > > I am attempting to append some new data to some > > > existing data, > > > > and ran > > > > > into some trouble. When loading, I join the new > > > data to the existing > > > > > data on a particular column, and then > > deactivate & > > > purgeInactive on > > > > > the matching records. Then when I try to append > > > the new data to the > > > > > existing data, I hit a seg fault using rev > > 536. If I > > > > > call purgeIndexFiles before the append, it > seems > > > to avoid the crash, > > > > > but I wasn't sure if that was recommended? > > > > > > > > > > My code is essentially: > > > > > > > > > > ibis::part existing_part("my_data"); > > > > > ibis::part incoming_part("new_data"); > > > > > std::auto_ptr<ibis::quaere> > > > > > join(ibis::quaere::create(&existing_part, > > > &incoming_part, > > > > > "my_primary_key")); > > > > > std::auto_ptr<ibis::table> > > > rs(join->select("my_primary_key")); > > > > > //then build the where clause > > > > > working_part.deactivate("my_primary_key > > in (3, > > > 4, 5)"); > > > > > working_part.purgeInactive(); > > > > > working_part.append(incoming_data); > > > > > > > > > > > > > > > Which yields the following: > > > > > > > > > > part[my_data]::deactivate marked 9 rows as > > > inactive, leaving 10 > > > > > active rows out of 19 > > > > > part[my_data]::purgeInactive to remove 9 > out > > > of 19 rows > > > > > Warning -- fileManager::flushDir can not > > > remove in-memory file > > > > > (my_data/my_primary_key.idx). It is in use > > > > > Warning -- fileManager::flushDir(my_data) > > > finished with 1 file > > > > > still in memory > > > > > Constructed a part named my_data > > > > > filter::sift1S -- processing data partition > > > my_data > > > > > Segmentation fault (core dumped) > > > > > > > > > > Many Thanks, > > > > > Greg > > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
