Hi John, Thanks. Attached please find two log files with verbose level=4, with version 2.0.2 and 1.3.8, using "ibis" included in the source code tree. (I have removed/masked sensitive info.)
Duration in 1.3.8 version is about 0.104sec; Duration in 2.0.2 version is about 0.332sec; In my observation in our real code, the performance downgrade may be also in other areas, as the we are running similar queries on same columns, all of them are slower. FYI: the data/index files were built using 1.3.8. Thanks Jilong Wang -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of K. John Wu Sent: Thursday, June 04, 2015 1:02 PM To: FastBit Users Subject: Re: [FastBit-users] FastBit 2.0.2 is 3X slower on index files built using 1.3.8? Hi, Jilong, This is might have something to how the dictionary used for keyword indexes are initialized. Since you are writing a program calling FastBit, presumably, you might be processing many queries in your program. In which case, this increased start up time might not matter too much. However, to know this for sure, would you mind turn up verboseness level to 4 or above and send us the log file. This will allow us to know for sure where the time is spent in your program. By the way, in your program, you can increase the verboseness level by calling: ibis::util::setVerboseLevel(4); Good luck. John On 5/22/15 4:45 PM, wang jilong wrote: > Hi John, > > We have compared two cases: > Case 1. running programs built with fastbit 1.3.8, on index files > built using 1.3.8. > Case 2. running programs built with fastbit 2.0.2, on index files > built using 1.3.8. > > With everything same, and no single line change on our programs. The case 1 > was run first. > The case 1 (1.3.8 program on 1.3.8 index files) is at least 3X faster. > > The comparison using "2.0.2 ibis" and "1.3.8 ibis" showed the same behavior. > > Out query is simple: 'select count(1) WHERE COL_A contains "com.google.com" > and COL_B=0'. > > There are about 60M records in index files. > > COL_A, COL_B are defined below as in "-part.txt": > > ===================================================================== > Begin Column > name = "COL_A" > description = COL_A > data_type = "TEXT" > index=keywords delimiters=";" > End Column > > Begin Column > name = "COL_B" > data_type = "INT" > minimum = 0 > maximum = 2 > index = <binning none/><encoding equality/> > End Column > ===================================================================== > > Question: Is it the expected behavior, or we did something wrong? > > Thanks > > Jilong Wang > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
FastBit ibis1.3.8 Log started on Fri Jun 5 07:02:11 2015 MY_/fastbit-ibis1.3.8/examples/.libs/ibis Options summary: batch mode, log level 4, skipping estimation fileManager initialization complete -- maxBytes=16112781312, maxOpenFiles=768 util::gatherParts -- examining MYfastbitIdx/PARTITION_NAME Constructed a part named PARTITION_NAME activeDir = "MYfastbitIdx/PARTITION_NAME" part: PARTITION_NAME (MY_/fastbit-ibis1.3.8/examples/.libs/ardea -d MY_/buildIndex/0/fastbitIdx/PARTITION_NAME -M MY_BuildIndex/dim_persona_label_export_data_type.txt -t 0.csv) with A-REAL-COUNT rows, 108 columns Column list: ###### I REMOVED THEM ###### Partition[1]: PARTITION_NAME Query[1]: select count(1) WHERE COL_A contains "VALUE_A" and COL_B=0 doQuaere -- processing "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" Constructing selectClause @ 0x7fff4926de20 Freeing selectClause @ 0x7fff4926de20 Constructing selectClause @ 0x7fff4926e0c0 selectClause::fillNames -- select clause internal details: low-level expressions (names_[1], aggr_[1], atms_[1]): 0: __0, COUNT(*) high-level expressions (xnames_[1], xtms_[1]): 0: _0, COUNT(*) filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- start timer ... countQuery::setWhereClause accepted new query conditions "(0x6554d0 AND 0x6566b0)" Constructed in-memory data partition _xlYZ0 -- filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- with 1 column filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- processing a select clause with 1 term, 0 of which is plain filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- processing data partition PARTITION_NAME countQuery assigned data partition PARTITION_NAME countQuery::evaluate -- start timer ... index::create -- attempt to read index type #21 from MYfastbitIdx/PARTITION_NAME/COL_B.idx for column PARTITION_NAME.COL_B direkte[COL_B]::read(0x657030) finished reading index header with nrows=A-REAL-COUNT and bits.size()=3 index::create(PARTITION_NAME.COL_B) reading the existing index took 7.84341e-05 sec index::create(PARTITION_NAME.COL_B) -- the direct index for column PARTITION_NAME.COL_B was read from MYfastbitIdx/PARTITION_NAME in 0 sec(CPU), 0.000216035 sec(elapsed) The direct bitmap index for COL_B contains 3 bit vectors ###### I removed THEM ###### index::create -- attempt to read index type #18 from MYfastbitIdx/PARTITION_NAME/COL_A.idx for column PARTITION_NAME.COL_A dictionary::read(MYfastbitIdx/PARTITION_NAME/COL_A.terms) -- duration: 0 sec(CPU), 0.000135055 sec(elapsed) index::create(PARTITION_NAME.COL_A) reading the existing index took 0.000239864 sec index::create(PARTITION_NAME.COL_A) -- the keywords index for column PARTITION_NAME.COL_A was read from MYfastbitIdx/PARTITION_NAME in 0 sec(CPU), 0.000316342 sec(elapsed) The keywords index for column PARTITION_NAME.COL_A contains 1580 terms (optionally followed by term frequencies) (printing 1 out of every 50) ###### I REMOVED THEM ###### countQuery::doEvaluate(0x6554d0: COL_A CONTAINS 'VALUE_A', mask.cnt()=A_READ_COUNT2) --> 20449348, ierr = 1 column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(20449348, A-REAL-COUNT)) completed with ierr = 2 countQuery::doEvaluate(0x6566b0: COL_B == 0, mask.cnt()=20449348) --> 8468024, ierr = 2 countQuery::doEvaluate(0x65b310: (0x6554d0 AND 0x6566b0), mask.cnt()=A_READ_COUNT2) --> 8468024, ierr = 2 countQuery::evaluate -- Select count(*) From PARTITION_NAME Where (0x6554d0 AND 0x6566b0) --> 8468024 countQuery::evaluate -- duration: 0.060003 sec(CPU), 0.0612946 sec(elapsed) bord[_xlYZ0]::append added 8468024 rows to make a total of 8468024 filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) creates an in-memory data partition with 8468024 rows and 1 column bord::groupbya -- processing aggregations for "Select COUNT(*) From _xlYZ0" Constructed in-memory data partition JSGw02 -- Select COUNT(*) From _xlYZ0 -- with 1 row and 1 column table::freeBuffers to free buf[1] and typ[1] bord::groupby -- completed Table (in memory) JSGw02 (Select COUNT(*) From _xlYZ0) contsists of 1 column and 1 row _0 UINT COUNT(*) filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) produces an in-memory data partition with 1 row and 1 column clearing data partition _xlYZ0 filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- duration: 0.104006 sec(CPU), 0.105114 sec(elapsed) Freeing selectClause @ 0x7fff4926e0c0 doQuaere -- "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" produced a table with 1 row and 1 column, took 0.108006 CPU seconds, 0.108806 elapsed seconds doQuaere -- the result table (1 x 1) for "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" 8468024 clearing data partition JSGw02 MY_/fastbit-ibis1.3.8/examples/.libs/ibis -- total CPU time 0.112006 s, total elapsed time 0.113214 s clearing data partition PARTITION_NAME Cleaning up the file manager Total pages accessed through read(unistd.h) is estimated to be 24 fileManager::clear -- starting ... --- Fri Jun 5 07:02:11 2015 The number of memory mapped files is 1. (max = 768) file name: MYfastbitIdx/PARTITION_NAME/COL_A.idx storage @ 0x657350, 0x7f4dca7ff000, 1st 32 bits = 49424923, 1st 64 bits = 4125349424923 file descriptor 3 fmap size 1897828708 base address 0x7f4dca7ff000 mapped y opened at Fri Jun 5 07:02:11 2015 last used at Fri Jun 5 07:02:11 2015 # of bytes 1897828708 # of past acc 4 # of active acc 0 Size of all mapped files is 1,897,828,708 The number of files read into memory is 2 file name: MYfastbitIdx/PARTITION_NAME/COL_B.idx storage @ 0x657030, 0x65b490, 1st 32 bits = 49424923, 1st 64 bits = 4155349424923 storage @ 0x657030, 0x65b490, 1st 32 bits = 49424923, 1st 64 bits = 4155349424923 mapped n opened at Fri Jun 5 07:02:11 2015 last used at Fri Jun 5 07:02:11 2015 # of bytes 92300 # of past acc 8 # of active acc 0 file name: MYfastbitIdx/PARTITION_NAME/COL_B.msk storage @ 0x656990, 0x655a40, 1st 32 bits = c0005bdf, 1st 64 bits = 7ffbffffc0005bdf mapped n opened at Fri Jun 5 07:02:11 2015 last used at Fri Jun 5 07:02:11 2015 # of bytes 60 # of past acc 2 # of active acc 0 The total size of all files read into memory is 92,360 Size of all named storages is 1,897,921,068 Size of all unnamed storages is 0 The total size of all named and unnamed storages is 1,897,921,068 The prescribed maximum size is 16,112,781,312 Number of pages accessed (recorded so far) is 24 (page size = 4096) fileManager::clear -- completed with 0 byte of storage remain in memory after removing all managed objects --- Fri Jun 5 07:02:11 2015 The number of memory mapped files is 0. (max = 768) Size of all mapped files is 0 The number of files read into memory is 0 The total size of all files read into memory is 0 Size of all named storages is 0 Size of all unnamed storages is 0 The total size of all named and unnamed storages is 0 The prescribed maximum size is 16,112,781,312 Number of pages accessed (recorded so far) is 24 (page size = 4096) Report from getrusage: maxrss = 57044 pages (4096 bytes/page), majflt = 1, minflt = 20020, inblock = 50544, outblock = 48 fileManager decommissioned
FastBit 2.0.2 Log started on Fri Jun 5 07:00:04 2015 MY/install-fastbit-2.0.2/bin/ibis Options summary: batch mode, log level 4, skipping estimation current data limit: 18446744073709551615 (soft), 18446744073709551615 (hard) fileManager initialization complete -- maxBytes=16112781312, maxOpenFiles=768 util::gatherParts -- examining MY/fastbitIdx/PARTITION_NAME Constructed a part named PARTITION_NAME activeDir = "MY/fastbitIdx/PARTITION_NAME" part: PARTITION_NAME (MY/fastbit-ibis1.3.8/examples/.libs/ardea -d MY/buildIndex/0/fastbitIdx/PARTITION_NAME -M MY_BuildIndex/dim_persona_label_export_data_type.txt -t 0.csv) with A_REAL_COUNT rows, 108 columns Column list: ###### I REMOVED the column list ###### Partition[1]: PARTITION_NAME Query[1]: select count(1) WHERE COL_A contains "VALUE_A" and COL_B=0 doQuaere -- processing "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" Constructing selectClause @ 0x7ffffb631960 Freeing selectClause @ 0x7ffffb631960 Constructing selectClause @ 0x7ffffb631ac0 selectClause::fillNames -- select clause internal details: low-level expressions (names_[1], aggr_[1], atms_[1]): 0: __0, COUNT(*) high-level expressions (xnames_[1], xtms_[1]): 0: __0, COUNT(*) filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- start timer ... countQuery::setWhereClause accepted new query conditions "(0x654a50 AND 0x642330)" Constructed in-memory data partition _xCYZ1 -- filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- with 1 column filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- processing a select clause with 1 term, 0 of which is plain filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- processing data partition PARTITION_NAME countQuery assigned data partition PARTITION_NAME countQuery::evaluate -- start timer ... index::create -- attempt to read index type #21 from MY/fastbitIdx/PARTITION_NAME/COL_B.idx for column PARTITION_NAME.COL_B index::ctor reconstituted an index for PARTITION_NAME.COL_B from storage object @ 0x655890 direkte[COL_B]::read(0x655890) finished reading index header with nrows=A_REAL_COUNT and bits.size()=3 index::create(PARTITION_NAME.COL_B) reading the existing index took 9.42145e-05 sec index::create -- attempt to read index type #18 from MY/fastbitIdx/PARTITION_NAME/COL_A.idx for column PARTITION_NAME.COL_A index::ctor reconstituted an index for PARTITION_NAME.COL_A from storage object @ 0x66c4d0 dictionary::read(MY/fastbitIdx/PARTITION_NAME/COL_A.terms) -- duration: 0 sec(CPU), 0.000521151 sec(elapsed) index::create(PARTITION_NAME.COL_A) reading the existing index took 0.000638694 sec text[PARTITION_NAME.COL_A]::keywordSearch(VALUE_A) -- duration: 0.020001 sec(CPU), 0.022487 sec(elapsed) countQuery::doEvaluate(COL_A CONTAINS 'VALUE_A', mask.cnt()= A_REAL_COUNT2) --> A_REAL_COUNT3, ierr = 1 column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, A_REAL_COUNT)) -- estimated cost with index = 45880, with sequential scan = 2.83044e+08 column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, A_REAL_COUNT)) completed with ierr = 2 column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, A_REAL_COUNT)) -- duration: 0.240015 sec(CPU), 0.244723 sec(elapsed) countQuery::doEvaluate(COL_B == 0, mask.cnt()=A_REAL_COUNT3) --> 8468024, ierr = 2 countQuery::doEvaluate((0x654a50 AND 0x642330), mask.cnt()= A_REAL_COUNT2) --> 8468024, ierr = 2 From PARTITION_NAME Where (0x654a50 AND 0x642330) --> 8468024 countQuery::evaluate -- duration: 0.288018 sec(CPU), 0.290324 sec(elapsed) bord[_xCYZ1]::append added 8468024 rows to make a total of 8468024 filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) creates an in-memory data partition with 8468024 rows and 1 column bord::groupbya -- processing aggregations for "Select COUNT(*) From _xCYZ1" Constructed in-memory data partition zxGI01 -- Select COUNT(*) From _xCYZ1 -- with 1 row and 1 column table::freeBuffers to free buf[1] and typ[1] bord::groupby -- completed Table (in memory) zxGI01 (Select COUNT(*) From _xCYZ1) contsists of 1 column and 1 row __0 UINT COUNT(*) filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) produces an in-memory data partition with 1 row and 1 column clearing data partition _xCYZ1 filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- duration: 0.332021 sec(CPU), 0.332519 sec(elapsed) Freeing selectClause @ 0x7ffffb631ac0 doQuaere -- "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" produced a table with 1 row and 1 column, took 0.336021 CPU seconds, 0.336029 elapsed seconds doQuaere -- the result table (1 x 1) for "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" 8468024 clearing data partition zxGI01 MY/install-fastbit-2.0.2/bin/ibis -- total CPU time 0.340021 s, total elapsed time 0.340736 s clearing data partition PARTITION_NAME Cleaning up the file manager Total pages accessed through read(unistd.h) is estimated to be 24 fileManager::clear -- starting ... --- Fri Jun 5 07:00:05 2015 The number of memory mapped files is 1. (max = 768) file name: MY/fastbitIdx/PARTITION_NAME/COL_A.idx storage @ 0x66c4d0, 0x7fd2f984c000, 1st 32 bits = 49424923, 1st 64 bits = 4125349424923 file descriptor 3 fmap size 1897828708 base address 0x7fd2f984c000 mapped y opened at Fri Jun 5 07:00:04 2015 last used at Fri Jun 5 07:00:04 2015 # of bytes 1897828708 # of past acc 4 # of active acc 0 Size of all mapped files is 1,897,828,708 The number of files read into memory is 2 file name: MY/fastbitIdx/PARTITION_NAME/COL_B.idx storage @ 0x655890, 0x6558f0, 1st 32 bits = 49424923, 1st 64 bits = 4155349424923 file descriptor 3 fmap size 1897828708 base address 0x7fd2f984c000 mapped y opened at Fri Jun 5 07:00:04 2015 last used at Fri Jun 5 07:00:04 2015 # of bytes 1897828708 # of past acc 4 # of active acc 0 Size of all mapped files is 1,897,828,708 The number of files read into memory is 2 file name: MY/fastbitIdx/PARTITION_NAME/COL_B.idx storage @ 0x655890, 0x6558f0, 1st 32 bits = 49424923, 1st 64 bits = 4155349424923 mapped n opened at Fri Jun 5 07:00:04 2015 last used at Fri Jun 5 07:00:04 2015 # of bytes 92300 # of past acc 8 # of active acc 0 file name: MY/fastbitIdx/PARTITION_NAME/COL_B.msk storage @ 0x655060, 0x654fb0, 1st 32 bits = c0005bdf, 1st 64 bits = 7ffbffffc0005bdf mapped n opened at Fri Jun 5 07:00:04 2015 last used at Fri Jun 5 07:00:04 2015 # of bytes 60 # of past acc 2 # of active acc 0 The total size of all files read into memory is 92,360 Size of all named storages is 1,897,921,068 Size of all unnamed storages is 0 The total size of all named and unnamed storages is 1,897,921,068 The prescribed maximum size is 16,112,781,312 Number of pages accessed (recorded so far) is 24 (page size = 4096) fileManager::clear -- completed with 0 byte of storage remain in memory after removing all managed objects --- Fri Jun 5 07:00:05 2015 The number of memory mapped files is 0. (max = 768) Size of all mapped files is 0 The total size of all files read into memory is 0 Size of all named storages is 0 Size of all unnamed storages is 0 The total size of all named and unnamed storages is 0 The prescribed maximum size is 16,112,781,312 Number of pages accessed (recorded so far) is 24 (page size = 4096) Report from getrusage: maxrss = 55924 pages (4096 bytes/page), majflt = 1, minflt = 16409, inblock = 52016, outblock = 40 fileManager decommissioned
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
