Hi John,

Thank you for the additional information. I still haven't been able to
figure out why sometimes my part.deactivate() call does not return the
number of rows that match the expression I pass in, but after adding
part.computeMinMax() call before calling deactivate, it seemed to change
the outcome, now I hit a seg fault calling deactivate (using rev 538):

#0  0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at
bitvector.h:628
#1  ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at
bitvector.cpp:1528
#2  0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109,
hi=110, res=...) at ixfuzz.cpp:744
#3  0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=...,
lower=...) at ixfuzz.cpp:950
#4  0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=...,
lower=..., upper=...) at irelic.h:50
#5  0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0, cmp=...,
mask=..., low=...) at column.cpp:5480
#6  0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0, cmp=...,
mask=..., low=...) at column.cpp:5701
#7  0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=...,
mask=..., hits=...) at part.cpp:3581
#8  0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260, term=0xaf38e18,
ht=...) at query.cpp:3683
#9  0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260, res=...)
at query.cpp:2966
#10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4,
    conds=0xb88112c "my_primary_key in
(199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...,
msk=...) at part.cpp:4101
#11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4,
    conds=0xb88112c "my_primary_key in
(199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...)
at parti.cpp:1269

Thanks,
Greg

On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected]> wrote:

> Hi, Greg,
>
> The mystery might be related to the lazy updating of the min/max
> values.  Even when the min and max values are wrong in the metadata
> file, FastBit should be able to answer the queries correctly.  Our
> first application wanted us to use the min and max as nominal lower
> and upper bounds, the actually min and max could vary significantly
> from the nominal bounds.  To enforce the computation of the min and
> max values, please call ibis::part::computeMinMax.
>
> When you initialize an ibis::part with a single string argument, it is
> assumed to be a directory name if it contains directory separators or
> it names an existing directory.  If the string does not contain a '/'
> or does not name an existing directory, then it is necessary to have
> the second string argument (which could be nil) to tell FastBit to use
> the first argument as the directory name.
>
> Let me know if you have any additional questions.
>
> John
>
>
>
>
> On 8/14/12 12:58 PM, Greg Barker wrote:
> > Hi John,
> >
> > I've been running into a scenario where I'm not able to deactivate
> > rows that exist in the data file. I noticed when it gets into this
> > state, the min & max for my_primary_key in -part.txt seems to be
> > incorrect. I'm having trouble coming up with a small program that can
> > reproduce the issue, but this seems to get pretty close. Before I ran
> > it, the three directories it uses existed and were empty.
> >
> > $ cat 7rows.csv
> > 1,93.19,AAA
> > 2,49.14,BBB
> > 3,49.19,DDD
> > 4,59.10,EEE
> > 5,34.48,FFF
> > 6,91.49,AAA
> > 7,19.50,BBB
> >
> > $ cat 5rows.csv
> > 1,93.19,AAA
> > 2,49.14,BBB
> > 3,50.41,CCC
> > 4,58.59,AAA
> > 5,19.53,CCC
> >
> > $ cat loading_error.cc
> > #include <memory>
> >
> > #include <ibis.h>
> >
> > int main(int argc, char **argv)
> > {
> >     ibis::gVerbose = 1;
> >
> >     char existing_dir[] = "existing_dir";
> >     char first_incoming_dir[] = "first_incoming_dir";
> >     char second_incoming_dir[] = "second_incoming_dir";
> >
> >     std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
> >     firstTable->addColumn("my_primary_key", ibis::LONG);
> >     firstTable->addColumn("my_double_value", ibis::DOUBLE);
> >     firstTable->addColumn("my_category_value", ibis::CATEGORY);
> >     firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ",");
> >     firstTable->write(first_incoming_dir, "working", NULL, NULL, NULL);
> >     firstTable->clearData();
> >
> >     ibis::part existing_part(existing_dir, static_cast<const char*>(0));
> >     existing_part.append(first_incoming_dir);
> >     existing_part.commit(first_incoming_dir);
> >     existing_part.purgeIndexFiles();
> >     existing_part.buildIndexes();
> >     existing_part.emptyCache();
> >
> >     std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create());
> >     secondTable->addColumn("my_primary_key", ibis::LONG);
> >     secondTable->addColumn("my_double_value", ibis::DOUBLE);
> >     secondTable->addColumn("my_category_value", ibis::CATEGORY);
> >     secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ",");
> >     secondTable->write(second_incoming_dir, "working", NULL, NULL, NULL);
> >     secondTable->clearData();
> >
> >     ibis::part second_part(second_incoming_dir);
> >
> >     int deactivatedCount = 0;
> >     deactivatedCount = existing_part.deactivate("my_primary_key in (1,
> > 2, 3, 4, 5)");
> >     std::cout << "deactivatedCount = " << deactivatedCount << std::endl;
> >     existing_part.purgeInactive();
> >
> >     existing_part.append(second_incoming_dir);
> >     existing_part.commit(second_incoming_dir);
> >     existing_part.purgeIndexFiles();
> >     existing_part.buildIndexes();
> >     existing_part.emptyCache();
> > }
> >
> > I end up with this in the -part.txt in existing_dir:
> >
> > Begin Column
> > name = "my_primary_key"
> > data_type = "LONG"
> > minimum = 6
> > maximum = 7
> > End Column
> >
> > I was thinking it should have min = 1 & max = 7.
> >
> > Thank you,
> > Greg
> >
> > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Whoops my mistake, deactivate() returns the number of inactive
> >     rows, just like it says in the doc :)
> >
> >     Greg
> >
> >
> >     On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker
> >     <[email protected] <mailto:[email protected]>> wrote:
> >
> >         Hello John,
> >
> >         Thank you for the updated code, it appears to be working quite
> >         well now for that case. I really appreciate it.
> >
> >         Another thing I noticed while I was testing is that if you
> >         call deactivate() multiple times before purgeInactive(), the
> >         return value was not what I expected. Do I need to call
> >         purgeInactive() after each deactivate()?
> >
> >         For example:
> >
> >         int deactivatedCount = 0;
> >         deactivatedCount += existing_part.deactivate("my_primary_key
> >         in (1, 2)");
> >         deactivatedCount += existing_part.deactivate("my_primary_key
> >         in (3, 4)");
> >         existing_part.purgeInactive();
> >         std::cout << "deactivatedCount = " << deactivatedCount << "\n";
> >
> >         Which yields:
> >
> >         part[existing_dir]::deactivate marked 2 rows as inactive,
> >         leaving 3 active rows out of 5
> >         part[existing_dir]::deactivate marked 2 rows as inactive,
> >         leaving 1 active row out of 5
> >         part[existing_dir]::purgeInactive to remove 4 out of 5 rows
> >         deactivatedCount = 6
> >
> >         Thanks again for your work,
> >
> >         Greg
> >
> >
> >         On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected]
> >         <mailto:[email protected]>> wrote:
> >
> >             Hi, Greg,
> >
> >             Thanks for the test case and test code.  The problem
> >             should be fix
> >             with SVN Revision 538.  Please give it a try when you get
> >             the chance.
> >
> >             There is a one minor change to your test program in order
> >             to it to do
> >             what you want.  The following line,
> >
> >                  ibis::part existing_part(existing_dir);
> >
> >             needs to be changed to
> >
> >                  ibis::part existing_part(existing_dir,
> >             static_cast<const char*>(0));
> >
> >             The version you used will create two directories hidden in
> >             .ibis,
> >             which are probably not what you want.
> >
> >             John
> >
> >
> >
> >             On 8/13/12 1:57 AM, Greg Barker wrote:
> >             > Hello,
> >             >
> >             > The type of my_primary_key is a long. I was able to
> >             reproduce the
> >             > error without the join, I also noticed that it does not
> >             hit the seg
> >             > fault if the category column is omitted. The following
> >             program will
> >             > hit the error.
> >             >
> >             > $ cat first_data_file.csv
> >             > 1,93.19,AAA
> >             > 2,49.14,BBB
> >             > 3,50.41,CCC
> >             > 4,58.59,AAA
> >             > 5,19.53,CCC
> >             >
> >             > $ cat second_data_file.csv
> >             > 3,49.19,DDD
> >             > 4,59.10,EEE
> >             > 5,34.48,FFF
> >             > 6,91.49,AAA
> >             > 7,19.50,BBB
> >             >
> >             > $ cat loading_error.cc
> >             > #include <memory>
> >             >
> >             > #include <ibis.h>
> >             >
> >             > int main(int argc, char **argv)
> >             > {
> >             >     char existing_dir[] = "existing_dir";
> >             >     char first_incoming_dir[] = "first_incoming_dir";
> >             >     char second_incoming_dir[] = "second_incoming_dir";
> >             >
> >             >     std::auto_ptr<ibis::tablex>
> >             firstTable(ibis::tablex::create());
> >             >     firstTable->addColumn("my_primary_key", ibis::LONG);
> >             >     firstTable->addColumn("my_double_value", ibis::DOUBLE);
> >             >     firstTable->addColumn("my_category_value",
> >             ibis::CATEGORY);
> >             >     firstTable->readCSV("first_data_file.csv", 0,
> >             first_incoming_dir,
> >             > ",");
> >             >     firstTable->write(first_incoming_dir, "working",
> >             NULL, NULL, NULL);
> >             >     firstTable->clearData();
> >             >
> >             >     ibis::part existing_part(existing_dir);
> >             >     existing_part.append(first_incoming_dir);
> >             >     existing_part.commit(first_incoming_dir);
> >             >     existing_part.purgeIndexFiles();
> >             >     existing_part.buildIndexes();
> >             >     existing_part.emptyCache();
> >             >
> >             >     std::auto_ptr<ibis::tablex>
> >             secondTable(ibis::tablex::create());
> >             >     secondTable->addColumn("my_primary_key", ibis::LONG);
> >             >     secondTable->addColumn("my_double_value",
> ibis::DOUBLE);
> >             >     secondTable->addColumn("my_category_value",
> >             ibis::CATEGORY);
> >             >     secondTable->readCSV("second_data_file.csv", 0,
> >             > second_incoming_dir, ",");
> >             >     secondTable->write(second_incoming_dir, "working",
> >             NULL, NULL, NULL);
> >             >     secondTable->clearData();
> >             >
> >             >     ibis::part second_part(second_incoming_dir);
> >             >
> >             >     existing_part.deactivate("my_primary_key = 1");
> >             >     existing_part.purgeInactive();
> >             >
> >             >     existing_part.append(second_incoming_dir);
> >             > }
> >             >
> >             > Thank you John,
> >             >
> >             > Greg
> >             >
> >             > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu <[email protected]
> >             <mailto:[email protected]>
> >             > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >             >
> >             >     Hi, Greg,
> >             >
> >             >     Thanks for the information.  Looks like we might
> >             have neglected to
> >             >     close some index files or somehow mishandled some
> >             index files.  There
> >             >     is only easy thing for us to check, this is related
> >             to the handling of
> >             >     categorical values (the columns of type
> >             ibis::CATEGORY).  Would you
> >             >     mind tell us if my_primary_key is an integer column
> >             or a CATEGORY
> >             >     column?
> >             >
> >             >     If it is not a CATEGORY, then we might have
> >             something a little bit
> >             >     more complex.  We would appreciate a small test case
> >             to replicate the
> >             >     problem.
> >             >
> >             >     John
> >             >
> >             >
> >             >     On 8/10/12 5:32 PM, Greg Barker wrote:
> >             >     > Hello -
> >             >     >
> >             >     > I am attempting to append some new data to some
> >             existing data,
> >             >     and ran
> >             >     > into some trouble. When loading, I join the new
> >             data to the existing
> >             >     > data on a particular column, and then deactivate &
> >             purgeInactive on
> >             >     > the matching records. Then when I try to append
> >             the new data to the
> >             >     > existing data, I hit a seg fault using rev 536. If I
> >             >     > call purgeIndexFiles before the append, it seems
> >             to avoid the crash,
> >             >     > but I wasn't sure if that was recommended?
> >             >     >
> >             >     > My code is essentially:
> >             >     >
> >             >     >     ibis::part existing_part("my_data");
> >             >     >     ibis::part incoming_part("new_data");
> >             >     >     std::auto_ptr<ibis::quaere>
> >             >     >     join(ibis::quaere::create(&existing_part,
> >             &incoming_part,
> >             >     >     "my_primary_key"));
> >             >     >     std::auto_ptr<ibis::table>
> >             rs(join->select("my_primary_key"));
> >             >     >     //then build the where clause
> >             >     >     working_part.deactivate("my_primary_key in (3,
> >             4, 5)");
> >             >     >     working_part.purgeInactive();
> >             >     >     working_part.append(incoming_data);
> >             >     >
> >             >     >
> >             >     > Which yields the following:
> >             >     >
> >             >     >     part[my_data]::deactivate marked 9 rows as
> >             inactive, leaving 10
> >             >     >     active rows out of 19
> >             >     >     part[my_data]::purgeInactive to remove 9 out
> >             of 19 rows
> >             >     >     Warning -- fileManager::flushDir can not
> >             remove in-memory file
> >             >     >     (my_data/my_primary_key.idx).  It is in use
> >             >     >     Warning -- fileManager::flushDir(my_data)
> >             finished with 1 file
> >             >     >     still in memory
> >             >     >     Constructed a part named my_data
> >             >     >     filter::sift1S -- processing data partition
> >             my_data
> >             >     >     Segmentation fault (core dumped)
> >             >     >
> >             >     > Many Thanks,
> >             >     > Greg
> >             >
> >             >
> >
> >
> >
> >
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to