John,

I changed the index for my_primary_key to <binning none/><encoding
equality/> per your suggestion and I am no longer hitting that seg fault.
That change, in combination with calling ibis::part::computeMinMax before I
call ibis::part::deactivate, seems to have resolved all my issues. I will
try to get a reduced test case to send to you.

Thank you for all the help you have provided, I really appreciate it.

Thanks again,
Greg

On Thu, Aug 16, 2012 at 7:20 AM, K. John Wu <[email protected]> wrote:

> Hi, Greg,
>
> Potentially, you are hitting a bug in ibis::fuzz.  If you have
> specifically instruction column my_primary_key to use something like
> the following (in the file -part.txt),
>
> index=<binning none/><encoding interval-equality/>
>
> you might want to change it to something like
>
> index=<binning none/>encoding equality/>
>
> The above instruction will force it to use a simpler index (named
> ibis::relic), which should have less trouble.
>
> If using the simpler index removes the problem, then we can be pretty
> sure the problem is with ibis::fuzz.  If the problem is with
> ibis::fuzz, I would need a copy of your file my_primary_key and the
> query expression that causes the error to study the problem further.
>
> Thanks.
>
> John
>
> PS: To change the index specification, you can either directly edit
> the file -part.txt or use the following command line (all on one line)
>
> .../ibis -d data-directory -b "my_primary_key:<binning none/><encoding
> equality/>" -z
>
>
>
> On 8/16/12 1:57 AM, Greg Barker wrote:
> > Hi John,
> >
> > Thank you for the additional information. I still haven't been able to
> > figure out why sometimes my part.deactivate() call does not return the
> > number of rows that match the expression I pass in, but after adding
> > part.computeMinMax() call before calling deactivate, it seemed to
> > change the outcome, now I hit a seg fault calling deactivate (using
> > rev 538):
> >
> > #0  0xb70cb663 in ibis::bitvector::size (this=0xbfd51c34, rhs=...) at
> > bitvector.h:628
> > #1  ibis::bitvector::operator-= (this=0xbfd51c34, rhs=...) at
> > bitvector.cpp:1528
> > #2  0xb737ed1e in ibis::fuzz::coarseEvaluate (this=0xb9fcd28, lo=109,
> > hi=110, res=...) at ixfuzz.cpp:744
> > #3  0xb737fae0 in ibis::fuzz::evaluate (this=0xb9fcd28, expr=...,
> > lower=...) at ixfuzz.cpp:950
> > #4  0xb72b66bb in ibis::relic::estimate (this=0xb9fcd28, expr=...,
> > lower=..., upper=...) at irelic.h:50
> > #5  0xb6a28663 in ibis::column::evaluateRange (this=0x9f0b5b0,
> > cmp=..., mask=..., low=...) at column.cpp:5480
> > #6  0xb6a24e56 in ibis::column::evaluateRange (this=0x9f0b5b0,
> > cmp=..., mask=..., low=...) at column.cpp:5701
> > #7  0xb611f6fb in ibis::part::evaluateRange (this=0xbfd51fc4, cmp=...,
> > mask=..., hits=...) at part.cpp:3581
> > #8  0xb7066bd8 in ibis::query::doEvaluate (this=0xbfd50260,
> > term=0xaf38e18, ht=...) at query.cpp:3683
> > #9  0xb70672fc in ibis::query::getExpandedHits (this=0xbfd50260,
> > res=...) at query.cpp:2966
> > #10 0xb6112fbf in ibis::part::stringToBitvector (this=0xbfd51fc4,
> >     conds=0xb88112c "my_primary_key in
> >
> (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...,
> > msk=...) at part.cpp:4101
> > #11 0xb73b63d2 in ibis::part::deactivate (this=0xbfd51fc4,
> >     conds=0xb88112c "my_primary_key in
> >
> (199200,199201,199202,199203,199204,199205,199206,199207,199208,199209,199210,199211,199212,199213,199214,199215,199216,199217,199218,199219,199220,199221,199222,199223,199224,19"...)
> > at parti.cpp:1269
> >
> > Thanks,
> > Greg
> >
> > On Wed, Aug 15, 2012 at 4:04 PM, K. John Wu <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi, Greg,
> >
> >     The mystery might be related to the lazy updating of the min/max
> >     values.  Even when the min and max values are wrong in the metadata
> >     file, FastBit should be able to answer the queries correctly.  Our
> >     first application wanted us to use the min and max as nominal lower
> >     and upper bounds, the actually min and max could vary significantly
> >     from the nominal bounds.  To enforce the computation of the min and
> >     max values, please call ibis::part::computeMinMax.
> >
> >     When you initialize an ibis::part with a single string argument, it
> is
> >     assumed to be a directory name if it contains directory separators or
> >     it names an existing directory.  If the string does not contain a '/'
> >     or does not name an existing directory, then it is necessary to have
> >     the second string argument (which could be nil) to tell FastBit to
> use
> >     the first argument as the directory name.
> >
> >     Let me know if you have any additional questions.
> >
> >     John
> >
> >
> >
> >
> >     On 8/14/12 12:58 PM, Greg Barker wrote:
> >     > Hi John,
> >     >
> >     > I've been running into a scenario where I'm not able to deactivate
> >     > rows that exist in the data file. I noticed when it gets into this
> >     > state, the min & max for my_primary_key in -part.txt seems to be
> >     > incorrect. I'm having trouble coming up with a small program
> >     that can
> >     > reproduce the issue, but this seems to get pretty close. Before
> >     I ran
> >     > it, the three directories it uses existed and were empty.
> >     >
> >     > $ cat 7rows.csv
> >     > 1,93.19,AAA
> >     > 2,49.14,BBB
> >     > 3,49.19,DDD
> >     > 4,59.10,EEE
> >     > 5,34.48,FFF
> >     > 6,91.49,AAA
> >     > 7,19.50,BBB
> >     >
> >     > $ cat 5rows.csv
> >     > 1,93.19,AAA
> >     > 2,49.14,BBB
> >     > 3,50.41,CCC
> >     > 4,58.59,AAA
> >     > 5,19.53,CCC
> >     >
> >     > $ cat loading_error.cc
> >     > #include <memory>
> >     >
> >     > #include <ibis.h>
> >     >
> >     > int main(int argc, char **argv)
> >     > {
> >     >     ibis::gVerbose = 1;
> >     >
> >     >     char existing_dir[] = "existing_dir";
> >     >     char first_incoming_dir[] = "first_incoming_dir";
> >     >     char second_incoming_dir[] = "second_incoming_dir";
> >     >
> >     >     std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
> >     >     firstTable->addColumn("my_primary_key", ibis::LONG);
> >     >     firstTable->addColumn("my_double_value", ibis::DOUBLE);
> >     >     firstTable->addColumn("my_category_value", ibis::CATEGORY);
> >     >     firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ",");
> >     >     firstTable->write(first_incoming_dir, "working", NULL, NULL,
> >     NULL);
> >     >     firstTable->clearData();
> >     >
> >     >     ibis::part existing_part(existing_dir, static_cast<const
> >     char*>(0));
> >     >     existing_part.append(first_incoming_dir);
> >     >     existing_part.commit(first_incoming_dir);
> >     >     existing_part.purgeIndexFiles();
> >     >     existing_part.buildIndexes();
> >     >     existing_part.emptyCache();
> >     >
> >     >     std::auto_ptr<ibis::tablex>
> secondTable(ibis::tablex::create());
> >     >     secondTable->addColumn("my_primary_key", ibis::LONG);
> >     >     secondTable->addColumn("my_double_value", ibis::DOUBLE);
> >     >     secondTable->addColumn("my_category_value", ibis::CATEGORY);
> >     >     secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ",");
> >     >     secondTable->write(second_incoming_dir, "working", NULL,
> >     NULL, NULL);
> >     >     secondTable->clearData();
> >     >
> >     >     ibis::part second_part(second_incoming_dir);
> >     >
> >     >     int deactivatedCount = 0;
> >     >     deactivatedCount = existing_part.deactivate("my_primary_key
> >     in (1,
> >     > 2, 3, 4, 5)");
> >     >     std::cout << "deactivatedCount = " << deactivatedCount <<
> >     std::endl;
> >     >     existing_part.purgeInactive();
> >     >
> >     >     existing_part.append(second_incoming_dir);
> >     >     existing_part.commit(second_incoming_dir);
> >     >     existing_part.purgeIndexFiles();
> >     >     existing_part.buildIndexes();
> >     >     existing_part.emptyCache();
> >     > }
> >     >
> >     > I end up with this in the -part.txt in existing_dir:
> >     >
> >     > Begin Column
> >     > name = "my_primary_key"
> >     > data_type = "LONG"
> >     > minimum = 6
> >     > maximum = 7
> >     > End Column
> >     >
> >     > I was thinking it should have min = 1 & max = 7.
> >     >
> >     > Thank you,
> >     > Greg
> >     >
> >     > On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker
> >     <[email protected] <mailto:[email protected]>
> >     > <mailto:[email protected] <mailto:[email protected]>>>
> >     wrote:
> >     >
> >     >     Whoops my mistake, deactivate() returns the number of inactive
> >     >     rows, just like it says in the doc :)
> >     >
> >     >     Greg
> >     >
> >     >
> >     >     On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker
> >     >     <[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >     >
> >     >         Hello John,
> >     >
> >     >         Thank you for the updated code, it appears to be working
> >     quite
> >     >         well now for that case. I really appreciate it.
> >     >
> >     >         Another thing I noticed while I was testing is that if you
> >     >         call deactivate() multiple times before purgeInactive(),
> the
> >     >         return value was not what I expected. Do I need to call
> >     >         purgeInactive() after each deactivate()?
> >     >
> >     >         For example:
> >     >
> >     >         int deactivatedCount = 0;
> >     >         deactivatedCount +=
> existing_part.deactivate("my_primary_key
> >     >         in (1, 2)");
> >     >         deactivatedCount +=
> existing_part.deactivate("my_primary_key
> >     >         in (3, 4)");
> >     >         existing_part.purgeInactive();
> >     >         std::cout << "deactivatedCount = " << deactivatedCount
> >     << "\n";
> >     >
> >     >         Which yields:
> >     >
> >     >         part[existing_dir]::deactivate marked 2 rows as inactive,
> >     >         leaving 3 active rows out of 5
> >     >         part[existing_dir]::deactivate marked 2 rows as inactive,
> >     >         leaving 1 active row out of 5
> >     >         part[existing_dir]::purgeInactive to remove 4 out of 5 rows
> >     >         deactivatedCount = 6
> >     >
> >     >         Thanks again for your work,
> >     >
> >     >         Greg
> >     >
> >     >
> >     >         On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected]
> >     <mailto:[email protected]>
> >     >         <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >     >
> >     >             Hi, Greg,
> >     >
> >     >             Thanks for the test case and test code.  The problem
> >     >             should be fix
> >     >             with SVN Revision 538.  Please give it a try when
> >     you get
> >     >             the chance.
> >     >
> >     >             There is a one minor change to your test program in
> >     order
> >     >             to it to do
> >     >             what you want.  The following line,
> >     >
> >     >                  ibis::part existing_part(existing_dir);
> >     >
> >     >             needs to be changed to
> >     >
> >     >                  ibis::part existing_part(existing_dir,
> >     >             static_cast<const char*>(0));
> >     >
> >     >             The version you used will create two directories
> >     hidden in
> >     >             .ibis,
> >     >             which are probably not what you want.
> >     >
> >     >             John
> >     >
> >     >
> >     >
> >     >             On 8/13/12 1:57 AM, Greg Barker wrote:
> >     >             > Hello,
> >     >             >
> >     >             > The type of my_primary_key is a long. I was able to
> >     >             reproduce the
> >     >             > error without the join, I also noticed that it
> >     does not
> >     >             hit the seg
> >     >             > fault if the category column is omitted. The
> following
> >     >             program will
> >     >             > hit the error.
> >     >             >
> >     >             > $ cat first_data_file.csv
> >     >             > 1,93.19,AAA
> >     >             > 2,49.14,BBB
> >     >             > 3,50.41,CCC
> >     >             > 4,58.59,AAA
> >     >             > 5,19.53,CCC
> >     >             >
> >     >             > $ cat second_data_file.csv
> >     >             > 3,49.19,DDD
> >     >             > 4,59.10,EEE
> >     >             > 5,34.48,FFF
> >     >             > 6,91.49,AAA
> >     >             > 7,19.50,BBB
> >     >             >
> >     >             > $ cat loading_error.cc
> >     >             > #include <memory>
> >     >             >
> >     >             > #include <ibis.h>
> >     >             >
> >     >             > int main(int argc, char **argv)
> >     >             > {
> >     >             >     char existing_dir[] = "existing_dir";
> >     >             >     char first_incoming_dir[] = "first_incoming_dir";
> >     >             >     char second_incoming_dir[] =
> >     "second_incoming_dir";
> >     >             >
> >     >             >     std::auto_ptr<ibis::tablex>
> >     >             firstTable(ibis::tablex::create());
> >     >             >     firstTable->addColumn("my_primary_key",
> >     ibis::LONG);
> >     >             >     firstTable->addColumn("my_double_value",
> >     ibis::DOUBLE);
> >     >             >     firstTable->addColumn("my_category_value",
> >     >             ibis::CATEGORY);
> >     >             >     firstTable->readCSV("first_data_file.csv", 0,
> >     >             first_incoming_dir,
> >     >             > ",");
> >     >             >     firstTable->write(first_incoming_dir, "working",
> >     >             NULL, NULL, NULL);
> >     >             >     firstTable->clearData();
> >     >             >
> >     >             >     ibis::part existing_part(existing_dir);
> >     >             >     existing_part.append(first_incoming_dir);
> >     >             >     existing_part.commit(first_incoming_dir);
> >     >             >     existing_part.purgeIndexFiles();
> >     >             >     existing_part.buildIndexes();
> >     >             >     existing_part.emptyCache();
> >     >             >
> >     >             >     std::auto_ptr<ibis::tablex>
> >     >             secondTable(ibis::tablex::create());
> >     >             >     secondTable->addColumn("my_primary_key",
> >     ibis::LONG);
> >     >             >     secondTable->addColumn("my_double_value",
> >     ibis::DOUBLE);
> >     >             >     secondTable->addColumn("my_category_value",
> >     >             ibis::CATEGORY);
> >     >             >     secondTable->readCSV("second_data_file.csv", 0,
> >     >             > second_incoming_dir, ",");
> >     >             >     secondTable->write(second_incoming_dir,
> "working",
> >     >             NULL, NULL, NULL);
> >     >             >     secondTable->clearData();
> >     >             >
> >     >             >     ibis::part second_part(second_incoming_dir);
> >     >             >
> >     >             >     existing_part.deactivate("my_primary_key = 1");
> >     >             >     existing_part.purgeInactive();
> >     >             >
> >     >             >     existing_part.append(second_incoming_dir);
> >     >             > }
> >     >             >
> >     >             > Thank you John,
> >     >             >
> >     >             > Greg
> >     >             >
> >     >             > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu
> >     <[email protected] <mailto:[email protected]>
> >     >             <mailto:[email protected] <mailto:[email protected]>>
> >     >             > <mailto:[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >     >             >
> >     >             >     Hi, Greg,
> >     >             >
> >     >             >     Thanks for the information.  Looks like we might
> >     >             have neglected to
> >     >             >     close some index files or somehow mishandled some
> >     >             index files.  There
> >     >             >     is only easy thing for us to check, this is
> >     related
> >     >             to the handling of
> >     >             >     categorical values (the columns of type
> >     >             ibis::CATEGORY).  Would you
> >     >             >     mind tell us if my_primary_key is an integer
> >     column
> >     >             or a CATEGORY
> >     >             >     column?
> >     >             >
> >     >             >     If it is not a CATEGORY, then we might have
> >     >             something a little bit
> >     >             >     more complex.  We would appreciate a small
> >     test case
> >     >             to replicate the
> >     >             >     problem.
> >     >             >
> >     >             >     John
> >     >             >
> >     >             >
> >     >             >     On 8/10/12 5:32 PM, Greg Barker wrote:
> >     >             >     > Hello -
> >     >             >     >
> >     >             >     > I am attempting to append some new data to some
> >     >             existing data,
> >     >             >     and ran
> >     >             >     > into some trouble. When loading, I join the new
> >     >             data to the existing
> >     >             >     > data on a particular column, and then
> >     deactivate &
> >     >             purgeInactive on
> >     >             >     > the matching records. Then when I try to append
> >     >             the new data to the
> >     >             >     > existing data, I hit a seg fault using rev
> >     536. If I
> >     >             >     > call purgeIndexFiles before the append, it
> seems
> >     >             to avoid the crash,
> >     >             >     > but I wasn't sure if that was recommended?
> >     >             >     >
> >     >             >     > My code is essentially:
> >     >             >     >
> >     >             >     >     ibis::part existing_part("my_data");
> >     >             >     >     ibis::part incoming_part("new_data");
> >     >             >     >     std::auto_ptr<ibis::quaere>
> >     >             >     >     join(ibis::quaere::create(&existing_part,
> >     >             &incoming_part,
> >     >             >     >     "my_primary_key"));
> >     >             >     >     std::auto_ptr<ibis::table>
> >     >             rs(join->select("my_primary_key"));
> >     >             >     >     //then build the where clause
> >     >             >     >     working_part.deactivate("my_primary_key
> >     in (3,
> >     >             4, 5)");
> >     >             >     >     working_part.purgeInactive();
> >     >             >     >     working_part.append(incoming_data);
> >     >             >     >
> >     >             >     >
> >     >             >     > Which yields the following:
> >     >             >     >
> >     >             >     >     part[my_data]::deactivate marked 9 rows as
> >     >             inactive, leaving 10
> >     >             >     >     active rows out of 19
> >     >             >     >     part[my_data]::purgeInactive to remove 9
> out
> >     >             of 19 rows
> >     >             >     >     Warning -- fileManager::flushDir can not
> >     >             remove in-memory file
> >     >             >     >     (my_data/my_primary_key.idx).  It is in use
> >     >             >     >     Warning -- fileManager::flushDir(my_data)
> >     >             finished with 1 file
> >     >             >     >     still in memory
> >     >             >     >     Constructed a part named my_data
> >     >             >     >     filter::sift1S -- processing data partition
> >     >             my_data
> >     >             >     >     Segmentation fault (core dumped)
> >     >             >     >
> >     >             >     > Many Thanks,
> >     >             >     > Greg
> >     >             >
> >     >             >
> >     >
> >     >
> >     >
> >     >
> >
> >
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to