Hi John,
Thank you for the additional information. I still haven't been able to
figure out why sometimes my part.deactivate() call does not return the
number of rows that match the expression I pass in, but after adding
part.computeMinMax() call before calling deactivate, it seemed to change
the outcome, now I hit a seg fault calling deactivate (using rev 538):
#0 0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at
bitvector.h:628
#1 ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at
bitvector.cpp:1528
#2 0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109,
hi=110, res=...) at ixfuzz.cpp:744
#3 0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=...,
lower=...) at ixfuzz.cpp:950
#4 0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=...,
lower=..., upper=...) at irelic.h:50
#5 0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0, cmp=...,
mask=..., low=...) at column.cpp:5480
#6 0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0, cmp=...,
mask=..., low=...) at column.cpp:5701
#7 0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=...,
mask=..., hits=...) at part.cpp:3581
#8 0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260, term=0xaf38e18,
ht=...) at query.cpp:3683
#9 0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260, res=...)
at query.cpp:2966
#10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4,
conds=0xb88112c "my_primary_key in
(199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...,
msk=...) at part.cpp:4101
#11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4,
conds=0xb88112c "my_primary_key in
(199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...)
at parti.cpp:1269
Thanks,
Greg
On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected]> wrote:
> Hi, Greg,
>
> The mystery might be related to the lazy updating of the min/max
> values. Even when the min and max values are wrong in the metadata
> file, FastBit should be able to answer the queries correctly. Our
> first application wanted us to use the min and max as nominal lower
> and upper bounds, the actually min and max could vary significantly
> from the nominal bounds. To enforce the computation of the min and
> max values, please call ibis::part::computeMinMax.
>
> When you initialize an ibis::part with a single string argument, it is
> assumed to be a directory name if it contains directory separators or
> it names an existing directory. If the string does not contain a '/'
> or does not name an existing directory, then it is necessary to have
> the second string argument (which could be nil) to tell FastBit to use
> the first argument as the directory name.
>
> Let me know if you have any additional questions.
>
> John
>
>
>
>
> On 8/14/12 12:58 PM, Greg Barker wrote:
> > Hi John,
> >
> > I've been running into a scenario where I'm not able to deactivate
> > rows that exist in the data file. I noticed when it gets into this
> > state, the min & max for my_primary_key in -part.txt seems to be
> > incorrect. I'm having trouble coming up with a small program that can
> > reproduce the issue, but this seems to get pretty close. Before I ran
> > it, the three directories it uses existed and were empty.
> >
> > $ cat 7rows.csv
> > 1,93.19,AAA
> > 2,49.14,BBB
> > 3,49.19,DDD
> > 4,59.10,EEE
> > 5,34.48,FFF
> > 6,91.49,AAA
> > 7,19.50,BBB
> >
> > $ cat 5rows.csv
> > 1,93.19,AAA
> > 2,49.14,BBB
> > 3,50.41,CCC
> > 4,58.59,AAA
> > 5,19.53,CCC
> >
> > $ cat loading_error.cc
> > #include <memory>
> >
> > #include <ibis.h>
> >
> > int main(int argc, char **argv)
> > {
> > ibis::gVerbose = 1;
> >
> > char existing_dir[] = "existing_dir";
> > char first_incoming_dir[] = "first_incoming_dir";
> > char second_incoming_dir[] = "second_incoming_dir";
> >
> > std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
> > firstTable->addColumn("my_primary_key", ibis::LONG);
> > firstTable->addColumn("my_double_value", ibis::DOUBLE);
> > firstTable->addColumn("my_category_value", ibis::CATEGORY);
> > firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ",");
> > firstTable->write(first_incoming_dir, "working", NULL, NULL, NULL);
> > firstTable->clearData();
> >
> > ibis::part existing_part(existing_dir, static_cast<const char*>(0));
> > existing_part.append(first_incoming_dir);
> > existing_part.commit(first_incoming_dir);
> > existing_part.purgeIndexFiles();
> > existing_part.buildIndexes();
> > existing_part.emptyCache();
> >
> > std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create());
> > secondTable->addColumn("my_primary_key", ibis::LONG);
> > secondTable->addColumn("my_double_value", ibis::DOUBLE);
> > secondTable->addColumn("my_category_value", ibis::CATEGORY);
> > secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ",");
> > secondTable->write(second_incoming_dir, "working", NULL, NULL, NULL);
> > secondTable->clearData();
> >
> > ibis::part second_part(second_incoming_dir);
> >
> > int deactivatedCount = 0;
> > deactivatedCount = existing_part.deactivate("my_primary_key in (1,
> > 2, 3, 4, 5)");
> > std::cout << "deactivatedCount = " << deactivatedCount << std::endl;
> > existing_part.purgeInactive();
> >
> > existing_part.append(second_incoming_dir);
> > existing_part.commit(second_incoming_dir);
> > existing_part.purgeIndexFiles();
> > existing_part.buildIndexes();
> > existing_part.emptyCache();
> > }
> >
> > I end up with this in the -part.txt in existing_dir:
> >
> > Begin Column
> > name = "my_primary_key"
> > data_type = "LONG"
> > minimum = 6
> > maximum = 7
> > End Column
> >
> > I was thinking it should have min = 1 & max = 7.
> >
> > Thank you,
> > Greg
> >
> > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Whoops my mistake, deactivate() returns the number of inactive
> > rows, just like it says in the doc :)
> >
> > Greg
> >
> >
> > On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> > Hello John,
> >
> > Thank you for the updated code, it appears to be working quite
> > well now for that case. I really appreciate it.
> >
> > Another thing I noticed while I was testing is that if you
> > call deactivate() multiple times before purgeInactive(), the
> > return value was not what I expected. Do I need to call
> > purgeInactive() after each deactivate()?
> >
> > For example:
> >
> > int deactivatedCount = 0;
> > deactivatedCount += existing_part.deactivate("my_primary_key
> > in (1, 2)");
> > deactivatedCount += existing_part.deactivate("my_primary_key
> > in (3, 4)");
> > existing_part.purgeInactive();
> > std::cout << "deactivatedCount = " << deactivatedCount << "\n";
> >
> > Which yields:
> >
> > part[existing_dir]::deactivate marked 2 rows as inactive,
> > leaving 3 active rows out of 5
> > part[existing_dir]::deactivate marked 2 rows as inactive,
> > leaving 1 active row out of 5
> > part[existing_dir]::purgeInactive to remove 4 out of 5 rows
> > deactivatedCount = 6
> >
> > Thanks again for your work,
> >
> > Greg
> >
> >
> > On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Hi, Greg,
> >
> > Thanks for the test case and test code. The problem
> > should be fix
> > with SVN Revision 538. Please give it a try when you get
> > the chance.
> >
> > There is a one minor change to your test program in order
> > to it to do
> > what you want. The following line,
> >
> > ibis::part existing_part(existing_dir);
> >
> > needs to be changed to
> >
> > ibis::part existing_part(existing_dir,
> > static_cast<const char*>(0));
> >
> > The version you used will create two directories hidden in
> > .ibis,
> > which are probably not what you want.
> >
> > John
> >
> >
> >
> > On 8/13/12 1:57 AM, Greg Barker wrote:
> > > Hello,
> > >
> > > The type of my_primary_key is a long. I was able to
> > reproduce the
> > > error without the join, I also noticed that it does not
> > hit the seg
> > > fault if the category column is omitted. The following
> > program will
> > > hit the error.
> > >
> > > $ cat first_data_file.csv
> > > 1,93.19,AAA
> > > 2,49.14,BBB
> > > 3,50.41,CCC
> > > 4,58.59,AAA
> > > 5,19.53,CCC
> > >
> > > $ cat second_data_file.csv
> > > 3,49.19,DDD
> > > 4,59.10,EEE
> > > 5,34.48,FFF
> > > 6,91.49,AAA
> > > 7,19.50,BBB
> > >
> > > $ cat loading_error.cc
> > > #include <memory>
> > >
> > > #include <ibis.h>
> > >
> > > int main(int argc, char **argv)
> > > {
> > > char existing_dir[] = "existing_dir";
> > > char first_incoming_dir[] = "first_incoming_dir";
> > > char second_incoming_dir[] = "second_incoming_dir";
> > >
> > > std::auto_ptr<ibis::tablex>
> > firstTable(ibis::tablex::create());
> > > firstTable->addColumn("my_primary_key", ibis::LONG);
> > > firstTable->addColumn("my_double_value", ibis::DOUBLE);
> > > firstTable->addColumn("my_category_value",
> > ibis::CATEGORY);
> > > firstTable->readCSV("first_data_file.csv", 0,
> > first_incoming_dir,
> > > ",");
> > > firstTable->write(first_incoming_dir, "working",
> > NULL, NULL, NULL);
> > > firstTable->clearData();
> > >
> > > ibis::part existing_part(existing_dir);
> > > existing_part.append(first_incoming_dir);
> > > existing_part.commit(first_incoming_dir);
> > > existing_part.purgeIndexFiles();
> > > existing_part.buildIndexes();
> > > existing_part.emptyCache();
> > >
> > > std::auto_ptr<ibis::tablex>
> > secondTable(ibis::tablex::create());
> > > secondTable->addColumn("my_primary_key", ibis::LONG);
> > > secondTable->addColumn("my_double_value",
> ibis::DOUBLE);
> > > secondTable->addColumn("my_category_value",
> > ibis::CATEGORY);
> > > secondTable->readCSV("second_data_file.csv", 0,
> > > second_incoming_dir, ",");
> > > secondTable->write(second_incoming_dir, "working",
> > NULL, NULL, NULL);
> > > secondTable->clearData();
> > >
> > > ibis::part second_part(second_incoming_dir);
> > >
> > > existing_part.deactivate("my_primary_key = 1");
> > > existing_part.purgeInactive();
> > >
> > > existing_part.append(second_incoming_dir);
> > > }
> > >
> > > Thank you John,
> > >
> > > Greg
> > >
> > > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu <[email protected]
> > <mailto:[email protected]>
> > > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > >
> > > Hi, Greg,
> > >
> > > Thanks for the information. Looks like we might
> > have neglected to
> > > close some index files or somehow mishandled some
> > index files. There
> > > is only easy thing for us to check, this is related
> > to the handling of
> > > categorical values (the columns of type
> > ibis::CATEGORY). Would you
> > > mind tell us if my_primary_key is an integer column
> > or a CATEGORY
> > > column?
> > >
> > > If it is not a CATEGORY, then we might have
> > something a little bit
> > > more complex. We would appreciate a small test case
> > to replicate the
> > > problem.
> > >
> > > John
> > >
> > >
> > > On 8/10/12 5:32 PM, Greg Barker wrote:
> > > > Hello -
> > > >
> > > > I am attempting to append some new data to some
> > existing data,
> > > and ran
> > > > into some trouble. When loading, I join the new
> > data to the existing
> > > > data on a particular column, and then deactivate &
> > purgeInactive on
> > > > the matching records. Then when I try to append
> > the new data to the
> > > > existing data, I hit a seg fault using rev 536. If I
> > > > call purgeIndexFiles before the append, it seems
> > to avoid the crash,
> > > > but I wasn't sure if that was recommended?
> > > >
> > > > My code is essentially:
> > > >
> > > > ibis::part existing_part("my_data");
> > > > ibis::part incoming_part("new_data");
> > > > std::auto_ptr<ibis::quaere>
> > > > join(ibis::quaere::create(&existing_part,
> > &incoming_part,
> > > > "my_primary_key"));
> > > > std::auto_ptr<ibis::table>
> > rs(join->select("my_primary_key"));
> > > > //then build the where clause
> > > > working_part.deactivate("my_primary_key in (3,
> > 4, 5)");
> > > > working_part.purgeInactive();
> > > > working_part.append(incoming_data);
> > > >
> > > >
> > > > Which yields the following:
> > > >
> > > > part[my_data]::deactivate marked 9 rows as
> > inactive, leaving 10
> > > > active rows out of 19
> > > > part[my_data]::purgeInactive to remove 9 out
> > of 19 rows
> > > > Warning -- fileManager::flushDir can not
> > remove in-memory file
> > > > (my_data/my_primary_key.idx). It is in use
> > > > Warning -- fileManager::flushDir(my_data)
> > finished with 1 file
> > > > still in memory
> > > > Constructed a part named my_data
> > > > filter::sift1S -- processing data partition
> > my_data
> > > > Segmentation fault (core dumped)
> > > >
> > > > Many Thanks,
> > > > Greg
> > >
> > >
> >
> >
> >
> >
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users