Hi John,
I've been running into a scenario where I'm not able to deactivate rows
that exist in the data file. I noticed when it gets into this state, the
min & max for my_primary_key in -part.txt seems to be incorrect. I'm having
trouble coming up with a small program that can reproduce the issue, but
this seems to get pretty close. Before I ran it, the three directories it
uses existed and were empty.
$ cat 7rows.csv
1,93.19,AAA
2,49.14,BBB
3,49.19,DDD
4,59.10,EEE
5,34.48,FFF
6,91.49,AAA
7,19.50,BBB
$ cat 5rows.csv
1,93.19,AAA
2,49.14,BBB
3,50.41,CCC
4,58.59,AAA
5,19.53,CCC
$ cat loading_error.cc
#include <memory>
#include <ibis.h>
int main(int argc, char **argv)
{
ibis::gVerbose = 1;
char existing_dir[] = "existing_dir";
char first_incoming_dir[] = "first_incoming_dir";
char second_incoming_dir[] = "second_incoming_dir";
std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
firstTable->addColumn("my_primary_key", ibis::LONG);
firstTable->addColumn("my_double_value", ibis::DOUBLE);
firstTable->addColumn("my_category_value", ibis::CATEGORY);
firstTable->readCSV("7rows.csv", 0, first_incoming_dir, ",");
firstTable->write(first_incoming_dir, "working", NULL, NULL, NULL);
firstTable->clearData();
ibis::part existing_part(existing_dir, static_cast<const char*>(0));
existing_part.append(first_incoming_dir);
existing_part.commit(first_incoming_dir);
existing_part.purgeIndexFiles();
existing_part.buildIndexes();
existing_part.emptyCache();
std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create());
secondTable->addColumn("my_primary_key", ibis::LONG);
secondTable->addColumn("my_double_value", ibis::DOUBLE);
secondTable->addColumn("my_category_value", ibis::CATEGORY);
secondTable->readCSV("5rows.csv", 0, second_incoming_dir, ",");
secondTable->write(second_incoming_dir, "working", NULL, NULL, NULL);
secondTable->clearData();
ibis::part second_part(second_incoming_dir);
int deactivatedCount = 0;
deactivatedCount = existing_part.deactivate("my_primary_key in (1, 2,
3, 4, 5)");
std::cout << "deactivatedCount = " << deactivatedCount << std::endl;
existing_part.purgeInactive();
existing_part.append(second_incoming_dir);
existing_part.commit(second_incoming_dir);
existing_part.purgeIndexFiles();
existing_part.buildIndexes();
existing_part.emptyCache();
}
I end up with this in the -part.txt in existing_dir:
Begin Column
name = "my_primary_key"
data_type = "LONG"
minimum = 6
maximum = 7
End Column
I was thinking it should have min = 1 & max = 7.
Thank you,
Greg
On Mon, Aug 13, 2012 at 9:13 PM, Greg Barker <[email protected]> wrote:
> Whoops my mistake, deactivate() returns the number of inactive rows, just
> like it says in the doc :)
>
> Greg
>
>
> On Mon, Aug 13, 2012 at 6:11 PM, Greg Barker <[email protected]>wrote:
>
>> Hello John,
>>
>> Thank you for the updated code, it appears to be working quite well now
>> for that case. I really appreciate it.
>>
>> Another thing I noticed while I was testing is that if you call
>> deactivate() multiple times before purgeInactive(), the return value was
>> not what I expected. Do I need to call purgeInactive() after each
>> deactivate()?
>>
>> For example:
>>
>> int deactivatedCount = 0;
>> deactivatedCount += existing_part.deactivate("my_primary_key in (1, 2)");
>> deactivatedCount += existing_part.deactivate("my_primary_key in (3, 4)");
>> existing_part.purgeInactive();
>> std::cout << "deactivatedCount = " << deactivatedCount << "\n";
>>
>> Which yields:
>>
>> part[existing_dir]::deactivate marked 2 rows as inactive, leaving 3
>> active rows out of 5
>> part[existing_dir]::deactivate marked 2 rows as inactive, leaving 1
>> active row out of 5
>> part[existing_dir]::purgeInactive to remove 4 out of 5 rows
>> deactivatedCount = 6
>>
>> Thanks again for your work,
>>
>> Greg
>>
>>
>> On Mon, Aug 13, 2012 at 4:10 PM, K. John Wu <[email protected]> wrote:
>>
>>> Hi, Greg,
>>>
>>> Thanks for the test case and test code. The problem should be fix
>>> with SVN Revision 538. Please give it a try when you get the chance.
>>>
>>> There is a one minor change to your test program in order to it to do
>>> what you want. The following line,
>>>
>>> ibis::part existing_part(existing_dir);
>>>
>>> needs to be changed to
>>>
>>> ibis::part existing_part(existing_dir, static_cast<const char*>(0));
>>>
>>> The version you used will create two directories hidden in .ibis,
>>> which are probably not what you want.
>>>
>>> John
>>>
>>>
>>>
>>> On 8/13/12 1:57 AM, Greg Barker wrote:
>>> > Hello,
>>> >
>>> > The type of my_primary_key is a long. I was able to reproduce the
>>> > error without the join, I also noticed that it does not hit the seg
>>> > fault if the category column is omitted. The following program will
>>> > hit the error.
>>> >
>>> > $ cat first_data_file.csv
>>> > 1,93.19,AAA
>>> > 2,49.14,BBB
>>> > 3,50.41,CCC
>>> > 4,58.59,AAA
>>> > 5,19.53,CCC
>>> >
>>> > $ cat second_data_file.csv
>>> > 3,49.19,DDD
>>> > 4,59.10,EEE
>>> > 5,34.48,FFF
>>> > 6,91.49,AAA
>>> > 7,19.50,BBB
>>> >
>>> > $ cat loading_error.cc
>>> > #include <memory>
>>> >
>>> > #include <ibis.h>
>>> >
>>> > int main(int argc, char **argv)
>>> > {
>>> > char existing_dir[] = "existing_dir";
>>> > char first_incoming_dir[] = "first_incoming_dir";
>>> > char second_incoming_dir[] = "second_incoming_dir";
>>> >
>>> > std::auto_ptr<ibis::tablex> firstTable(ibis::tablex::create());
>>> > firstTable->addColumn("my_primary_key", ibis::LONG);
>>> > firstTable->addColumn("my_double_value", ibis::DOUBLE);
>>> > firstTable->addColumn("my_category_value", ibis::CATEGORY);
>>> > firstTable->readCSV("first_data_file.csv", 0, first_incoming_dir,
>>> > ",");
>>> > firstTable->write(first_incoming_dir, "working", NULL, NULL, NULL);
>>> > firstTable->clearData();
>>> >
>>> > ibis::part existing_part(existing_dir);
>>> > existing_part.append(first_incoming_dir);
>>> > existing_part.commit(first_incoming_dir);
>>> > existing_part.purgeIndexFiles();
>>> > existing_part.buildIndexes();
>>> > existing_part.emptyCache();
>>> >
>>> > std::auto_ptr<ibis::tablex> secondTable(ibis::tablex::create());
>>> > secondTable->addColumn("my_primary_key", ibis::LONG);
>>> > secondTable->addColumn("my_double_value", ibis::DOUBLE);
>>> > secondTable->addColumn("my_category_value", ibis::CATEGORY);
>>> > secondTable->readCSV("second_data_file.csv", 0,
>>> > second_incoming_dir, ",");
>>> > secondTable->write(second_incoming_dir, "working", NULL, NULL,
>>> NULL);
>>> > secondTable->clearData();
>>> >
>>> > ibis::part second_part(second_incoming_dir);
>>> >
>>> > existing_part.deactivate("my_primary_key = 1");
>>> > existing_part.purgeInactive();
>>> >
>>> > existing_part.append(second_incoming_dir);
>>> > }
>>> >
>>> > Thank you John,
>>> >
>>> > Greg
>>> >
>>> > On Sun, Aug 12, 2012 at 3:27 PM, K. John Wu <[email protected]
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> > Hi, Greg,
>>> >
>>> > Thanks for the information. Looks like we might have neglected to
>>> > close some index files or somehow mishandled some index files.
>>> There
>>> > is only easy thing for us to check, this is related to the
>>> handling of
>>> > categorical values (the columns of type ibis::CATEGORY). Would you
>>> > mind tell us if my_primary_key is an integer column or a CATEGORY
>>> > column?
>>> >
>>> > If it is not a CATEGORY, then we might have something a little bit
>>> > more complex. We would appreciate a small test case to replicate
>>> the
>>> > problem.
>>> >
>>> > John
>>> >
>>> >
>>> > On 8/10/12 5:32 PM, Greg Barker wrote:
>>> > > Hello -
>>> > >
>>> > > I am attempting to append some new data to some existing data,
>>> > and ran
>>> > > into some trouble. When loading, I join the new data to the
>>> existing
>>> > > data on a particular column, and then deactivate & purgeInactive
>>> on
>>> > > the matching records. Then when I try to append the new data to
>>> the
>>> > > existing data, I hit a seg fault using rev 536. If I
>>> > > call purgeIndexFiles before the append, it seems to avoid the
>>> crash,
>>> > > but I wasn't sure if that was recommended?
>>> > >
>>> > > My code is essentially:
>>> > >
>>> > > ibis::part existing_part("my_data");
>>> > > ibis::part incoming_part("new_data");
>>> > > std::auto_ptr<ibis::quaere>
>>> > > join(ibis::quaere::create(&existing_part, &incoming_part,
>>> > > "my_primary_key"));
>>> > > std::auto_ptr<ibis::table>
>>> rs(join->select("my_primary_key"));
>>> > > //then build the where clause
>>> > > working_part.deactivate("my_primary_key in (3, 4, 5)");
>>> > > working_part.purgeInactive();
>>> > > working_part.append(incoming_data);
>>> > >
>>> > >
>>> > > Which yields the following:
>>> > >
>>> > > part[my_data]::deactivate marked 9 rows as inactive, leaving
>>> 10
>>> > > active rows out of 19
>>> > > part[my_data]::purgeInactive to remove 9 out of 19 rows
>>> > > Warning -- fileManager::flushDir can not remove in-memory
>>> file
>>> > > (my_data/my_primary_key.idx). It is in use
>>> > > Warning -- fileManager::flushDir(my_data) finished with 1
>>> file
>>> > > still in memory
>>> > > Constructed a part named my_data
>>> > > filter::sift1S -- processing data partition my_data
>>> > > Segmentation fault (core dumped)
>>> > >
>>> > > Many Thanks,
>>> > > Greg
>>> >
>>> >
>>>
>>
>>
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users