Hi, Greg,

Potentially, you are hitting a bug in ibis::fuzz.  If you have
specifically instruction column my_primary_key to use something like
the following (in the file -part.txt),

index=<binning none/><encoding interval-equality/>

you might want to change it to something like

index=<binning none/>encoding equality/>

The above instruction will force it to use a simpler index (named
ibis::relic), which should have less trouble.

If using the simpler index removes the problem, then we can be pretty
sure the problem is with ibis::fuzz.  If the problem is with
ibis::fuzz, I would need a copy of your file my_primary_key and the
query expression that causes the error to study the problem further.

Thanks.

John

PS: To change the index specification, you can either directly edit
the file -part.txt or use the following command line (all on one line)

.../ibis -d data-directory -b "my_primary_key:<binning none/><encoding
equality/>" -z



On 8/16/12 1:57 AM, Greg Barker wrote:
> Hi John,
> 
> Thank you for the additional information. I still haven't been able to
> figure out why sometimes my part.deactivate() call does not return the
> number of rows that match the expression I pass in, but after adding
> part.computeMinMax() call before calling deactivate, it seemed to
> change the outcome, now I hit a seg fault calling deactivate (using
> rev 538):
> 
> #0  0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at
> bitvector.h:628
> #1  ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at
> bitvector.cpp:1528
> #2  0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109,
> hi=110, res=...) at ixfuzz.cpp:744
> #3  0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=...,
> lower=...) at ixfuzz.cpp:950
> #4  0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=...,
> lower=..., upper=...) at irelic.h:50
> #5  0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0,
> cmp=..., mask=..., low=...) at column.cpp:5480
> #6  0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0,
> cmp=..., mask=..., low=...) at column.cpp:5701
> #7  0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=...,
> mask=..., hits=...) at part.cpp:3581
> #8  0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260,
> term=0xaf38e18, ht=...) at query.cpp:3683
> #9  0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260,
> res=...) at query.cpp:2966
> #10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4,
>     conds=0xb88112c "my_primary_key in
> (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...,
> msk=...) at part.cpp:4101
> #11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4,
>     conds=0xb88112c "my_primary_key in
> (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...)
> at parti.cpp:1269
> 
> Thanks,
> Greg
> 
> On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Greg,
> 
>     The mystery might be related to the lazy updating of the min/max
>     values.  Even when the min and max values are wrong in the metadata
>     file, FastBit should be able to answer the queries correctly.  Our
>     first application wanted us to use the min and max as nominal lower
>     and upper bounds, the actually min and max could vary significantly
>     from the nominal bounds.  To enforce the computation of the min and
>     max values, please call ibis::part::computeMinMax.
> 
>     When you initialize an ibis::part with a single string argument, it is
>     assumed to be a directory name if it contains directory separators or
>     it names an existing directory.  If the string does not contain a '/'
>     or does not name an existing directory, then it is necessary to have
>     the second string argument (which could be nil) to tell FastBit to use
>     the first argument as the directory name.
> 
>     Let me know if you have any additional questions.
> 
>     John
> 
> 
> 
> 
>     On 8/14/12 12:58 PM, Greg Barker wrote:
>     > Hi John,
>     >
>     > I've been running into a scenario where I'm not able to deactivate
>     > rows that exist in the data file. I noticed when it gets into this
>     > state, the min & max for my_primary_key in -part.txt seems to be
>     > incorrect. I'm having trouble coming up with a small program
>     that can
>     > reproduce the issue, but this seems to get pretty close. Before
>     I ran
>     > it, the three directories it uses existed and were empty.
>     >
>     > $ cat 7rows.csv
>     > 1,93.19,AAA
>     > 2,49.14,BBB
>     > 3,49.19,DDD
>     > 4,59.10,EEE
>     > 5,34.48,FFF
>     > 6,91.49,AAA
>     > 7,19.50,BBB
>     >
>     > $ cat 5rows.csv
>     > 1,93.19,AAA
>     > 2,49.14,BBB
>     > 3,50.41,CCC
>     > 4,58.59,AAA
>     > 5,19.53,CCC
>     >
>     > $ cat loading_error.cc
>     > #include <memory>
>     >
>     > #include <ibis.h>
>     >
>     > int main(int argc, char **argv)
>     > {
>     >     ibis::gVerbose = 1;
>     >
>     >     char existing_dir[] = "existing_dir";
>     >     char first_incoming_dir[] = "first_incoming_dir";
>     >     char second_incoming_dir[] = "second_incoming_dir";
>     >
>     >     std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
>     >     firstTable->addColumn("my_primary_key", ibis::LONG);
>     >     firstTable->addColumn("my_double_value", ibis::DOUBLE);
>     >     firstTable->addColumn("my_category_value", ibis::CATEGORY);
>     >     firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ",");
>     >     firstTable->write(first_incoming_dir, "working", NULL, NULL,
>     NULL);
>     >     firstTable->clearData();
>     >
>     >     ibis::part existing_part(existing_dir, static_cast<const
>     char*>(0));
>     >     existing_part.append(first_incoming_dir);
>     >     existing_part.commit(first_incoming_dir);
>     >     existing_part.purgeIndexFiles();
>     >     existing_part.buildIndexes();
>     >     existing_part.emptyCache();
>     >
>     >     std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create());
>     >     secondTable->addColumn("my_primary_key", ibis::LONG);
>     >     secondTable->addColumn("my_double_value", ibis::DOUBLE);
>     >     secondTable->addColumn("my_category_value", ibis::CATEGORY);
>     >     secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ",");
>     >     secondTable->write(second_incoming_dir, "working", NULL,
>     NULL, NULL);
>     >     secondTable->clearData();
>     >
>     >     ibis::part second_part(second_incoming_dir);
>     >
>     >     int deactivatedCount = 0;
>     >     deactivatedCount = existing_part.deactivate("my_primary_key
>     in (1,
>     > 2, 3, 4, 5)");
>     >     std::cout << "deactivatedCount = " << deactivatedCount <<
>     std::endl;
>     >     existing_part.purgeInactive();
>     >
>     >     existing_part.append(second_incoming_dir);
>     >     existing_part.commit(second_incoming_dir);
>     >     existing_part.purgeIndexFiles();
>     >     existing_part.buildIndexes();
>     >     existing_part.emptyCache();
>     > }
>     >
>     > I end up with this in the -part.txt in existing_dir:
>     >
>     > Begin Column
>     > name = "my_primary_key"
>     > data_type = "LONG"
>     > minimum = 6
>     > maximum = 7
>     > End Column
>     >
>     > I was thinking it should have min = 1 & max = 7.
>     >
>     > Thank you,
>     > Greg
>     >
>     > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker
>     <[email protected] <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>>
>     wrote:
>     >
>     >     Whoops my mistake, deactivate() returns the number of inactive
>     >     rows, just like it says in the doc :)
>     >
>     >     Greg
>     >
>     >
>     >     On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker
>     >     <[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >         Hello John,
>     >
>     >         Thank you for the updated code, it appears to be working
>     quite
>     >         well now for that case. I really appreciate it.
>     >
>     >         Another thing I noticed while I was testing is that if you
>     >         call deactivate() multiple times before purgeInactive(), the
>     >         return value was not what I expected. Do I need to call
>     >         purgeInactive() after each deactivate()?
>     >
>     >         For example:
>     >
>     >         int deactivatedCount = 0;
>     >         deactivatedCount += existing_part.deactivate("my_primary_key
>     >         in (1, 2)");
>     >         deactivatedCount += existing_part.deactivate("my_primary_key
>     >         in (3, 4)");
>     >         existing_part.purgeInactive();
>     >         std::cout << "deactivatedCount = " << deactivatedCount
>     << "\n";
>     >
>     >         Which yields:
>     >
>     >         part[existing_dir]::deactivate marked 2 rows as inactive,
>     >         leaving 3 active rows out of 5
>     >         part[existing_dir]::deactivate marked 2 rows as inactive,
>     >         leaving 1 active row out of 5
>     >         part[existing_dir]::purgeInactive to remove 4 out of 5 rows
>     >         deactivatedCount = 6
>     >
>     >         Thanks again for your work,
>     >
>     >         Greg
>     >
>     >
>     >         On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected]
>     <mailto:[email protected]>
>     >         <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >             Hi, Greg,
>     >
>     >             Thanks for the test case and test code.  The problem
>     >             should be fix
>     >             with SVN Revision 538.  Please give it a try when
>     you get
>     >             the chance.
>     >
>     >             There is a one minor change to your test program in
>     order
>     >             to it to do
>     >             what you want.  The following line,
>     >
>     >                  ibis::part existing_part(existing_dir);
>     >
>     >             needs to be changed to
>     >
>     >                  ibis::part existing_part(existing_dir,
>     >             static_cast<const char*>(0));
>     >
>     >             The version you used will create two directories
>     hidden in
>     >             .ibis,
>     >             which are probably not what you want.
>     >
>     >             John
>     >
>     >
>     >
>     >             On 8/13/12 1:57 AM, Greg Barker wrote:
>     >             > Hello,
>     >             >
>     >             > The type of my_primary_key is a long. I was able to
>     >             reproduce the
>     >             > error without the join, I also noticed that it
>     does not
>     >             hit the seg
>     >             > fault if the category column is omitted. The following
>     >             program will
>     >             > hit the error.
>     >             >
>     >             > $ cat first_data_file.csv
>     >             > 1,93.19,AAA
>     >             > 2,49.14,BBB
>     >             > 3,50.41,CCC
>     >             > 4,58.59,AAA
>     >             > 5,19.53,CCC
>     >             >
>     >             > $ cat second_data_file.csv
>     >             > 3,49.19,DDD
>     >             > 4,59.10,EEE
>     >             > 5,34.48,FFF
>     >             > 6,91.49,AAA
>     >             > 7,19.50,BBB
>     >             >
>     >             > $ cat loading_error.cc
>     >             > #include <memory>
>     >             >
>     >             > #include <ibis.h>
>     >             >
>     >             > int main(int argc, char **argv)
>     >             > {
>     >             >     char existing_dir[] = "existing_dir";
>     >             >     char first_incoming_dir[] = "first_incoming_dir";
>     >             >     char second_incoming_dir[] =
>     "second_incoming_dir";
>     >             >
>     >             >     std::auto_ptr<ibis::tablex>
>     >             firstTable(ibis::tablex::create());
>     >             >     firstTable->addColumn("my_primary_key",
>     ibis::LONG);
>     >             >     firstTable->addColumn("my_double_value",
>     ibis::DOUBLE);
>     >             >     firstTable->addColumn("my_category_value",
>     >             ibis::CATEGORY);
>     >             >     firstTable->readCSV("first_data_file.csv", 0,
>     >             first_incoming_dir,
>     >             > ",");
>     >             >     firstTable->write(first_incoming_dir, "working",
>     >             NULL, NULL, NULL);
>     >             >     firstTable->clearData();
>     >             >
>     >             >     ibis::part existing_part(existing_dir);
>     >             >     existing_part.append(first_incoming_dir);
>     >             >     existing_part.commit(first_incoming_dir);
>     >             >     existing_part.purgeIndexFiles();
>     >             >     existing_part.buildIndexes();
>     >             >     existing_part.emptyCache();
>     >             >
>     >             >     std::auto_ptr<ibis::tablex>
>     >             secondTable(ibis::tablex::create());
>     >             >     secondTable->addColumn("my_primary_key",
>     ibis::LONG);
>     >             >     secondTable->addColumn("my_double_value",
>     ibis::DOUBLE);
>     >             >     secondTable->addColumn("my_category_value",
>     >             ibis::CATEGORY);
>     >             >     secondTable->readCSV("second_data_file.csv", 0,
>     >             > second_incoming_dir, ",");
>     >             >     secondTable->write(second_incoming_dir, "working",
>     >             NULL, NULL, NULL);
>     >             >     secondTable->clearData();
>     >             >
>     >             >     ibis::part second_part(second_incoming_dir);
>     >             >
>     >             >     existing_part.deactivate("my_primary_key = 1");
>     >             >     existing_part.purgeInactive();
>     >             >
>     >             >     existing_part.append(second_incoming_dir);
>     >             > }
>     >             >
>     >             > Thank you John,
>     >             >
>     >             > Greg
>     >             >
>     >             > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu
>     <[email protected] <mailto:[email protected]>
>     >             <mailto:[email protected] <mailto:[email protected]>>
>     >             > <mailto:[email protected] <mailto:[email protected]>
>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>     >             >
>     >             >     Hi, Greg,
>     >             >
>     >             >     Thanks for the information.  Looks like we might
>     >             have neglected to
>     >             >     close some index files or somehow mishandled some
>     >             index files.  There
>     >             >     is only easy thing for us to check, this is
>     related
>     >             to the handling of
>     >             >     categorical values (the columns of type
>     >             ibis::CATEGORY).  Would you
>     >             >     mind tell us if my_primary_key is an integer
>     column
>     >             or a CATEGORY
>     >             >     column?
>     >             >
>     >             >     If it is not a CATEGORY, then we might have
>     >             something a little bit
>     >             >     more complex.  We would appreciate a small
>     test case
>     >             to replicate the
>     >             >     problem.
>     >             >
>     >             >     John
>     >             >
>     >             >
>     >             >     On 8/10/12 5:32 PM, Greg Barker wrote:
>     >             >     > Hello -
>     >             >     >
>     >             >     > I am attempting to append some new data to some
>     >             existing data,
>     >             >     and ran
>     >             >     > into some trouble. When loading, I join the new
>     >             data to the existing
>     >             >     > data on a particular column, and then
>     deactivate &
>     >             purgeInactive on
>     >             >     > the matching records. Then when I try to append
>     >             the new data to the
>     >             >     > existing data, I hit a seg fault using rev
>     536. If I
>     >             >     > call purgeIndexFiles before the append, it seems
>     >             to avoid the crash,
>     >             >     > but I wasn't sure if that was recommended?
>     >             >     >
>     >             >     > My code is essentially:
>     >             >     >
>     >             >     >     ibis::part existing_part("my_data");
>     >             >     >     ibis::part incoming_part("new_data");
>     >             >     >     std::auto_ptr<ibis::quaere>
>     >             >     >     join(ibis::quaere::create(&existing_part,
>     >             &incoming_part,
>     >             >     >     "my_primary_key"));
>     >             >     >     std::auto_ptr<ibis::table>
>     >             rs(join->select("my_primary_key"));
>     >             >     >     //then build the where clause
>     >             >     >     working_part.deactivate("my_primary_key
>     in (3,
>     >             4, 5)");
>     >             >     >     working_part.purgeInactive();
>     >             >     >     working_part.append(incoming_data);
>     >             >     >
>     >             >     >
>     >             >     > Which yields the following:
>     >             >     >
>     >             >     >     part[my_data]::deactivate marked 9 rows as
>     >             inactive, leaving 10
>     >             >     >     active rows out of 19
>     >             >     >     part[my_data]::purgeInactive to remove 9 out
>     >             of 19 rows
>     >             >     >     Warning -- fileManager::flushDir can not
>     >             remove in-memory file
>     >             >     >     (my_data/my_primary_key.idx).  It is in use
>     >             >     >     Warning -- fileManager::flushDir(my_data)
>     >             finished with 1 file
>     >             >     >     still in memory
>     >             >     >     Constructed a part named my_data
>     >             >     >     filter::sift1S -- processing data partition
>     >             my_data
>     >             >     >     Segmentation fault (core dumped)
>     >             >     >
>     >             >     > Many Thanks,
>     >             >     > Greg
>     >             >
>     >             >
>     >
>     >
>     >
>     >
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to