Hi John,

Thanks. Attached please find two log files with verbose level=4, with version 
2.0.2 and 1.3.8, 
using "ibis" included in the source code tree. (I have removed/masked sensitive 
info.)

Duration in 1.3.8 version is about 0.104sec;
Duration in 2.0.2 version is about 0.332sec;

In my observation in our real code, the performance downgrade may be also in 
other areas, 
as the we are running similar queries on same columns, all of them are slower.

FYI: the data/index files were built using 1.3.8.

Thanks

Jilong Wang 


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of K. John Wu
Sent: Thursday, June 04, 2015 1:02 PM
To: FastBit Users
Subject: Re: [FastBit-users] FastBit 2.0.2 is 3X slower on index files built 
using 1.3.8?

Hi, Jilong,

This is might have something to how the dictionary used for keyword
indexes are initialized.  Since you are writing a program calling
FastBit, presumably, you might be processing many queries in your
program.  In which case, this increased start up time might not matter
too much.  However, to know this for sure, would you mind turn up
verboseness level to 4 or above and send us the log file.  This will
allow us to know for sure where the time is spent in your program.

By the way, in your program, you can increase the verboseness level by
calling:

ibis::util::setVerboseLevel(4);


Good luck.

John


On 5/22/15 4:45 PM, wang jilong wrote:
> Hi John,
> 
> We have compared two cases:
>       Case 1. running programs built with fastbit 1.3.8,  on index files 
> built using 1.3.8.
>       Case 2. running programs built with fastbit 2.0.2,  on index files 
> built using 1.3.8.
> 
> With everything same, and no single line change on our programs. The case 1 
> was run first.
> The case 1 (1.3.8 program on 1.3.8 index files) is at least 3X faster.
> 
> The comparison using "2.0.2 ibis" and "1.3.8 ibis" showed the same behavior.
> 
> Out query is simple: 'select count(1) WHERE COL_A contains "com.google.com" 
> and COL_B=0'.
> 
> There are about 60M records in index files.
> 
> COL_A, COL_B are defined below as in "-part.txt":
> 
> =====================================================================
> Begin Column
> name = "COL_A"
> description = COL_A
> data_type = "TEXT"
> index=keywords delimiters=";"
> End Column 
> 
> Begin Column
> name = "COL_B"
> data_type = "INT"
> minimum = 0
> maximum = 2
> index = <binning none/><encoding equality/>
> End Column 
> =====================================================================
> 
> Question: Is it the expected behavior, or we did something wrong?
> 
> Thanks
> 
> Jilong Wang
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
FastBit ibis1.3.8
Log started on Fri Jun  5 07:02:11 2015

MY_/fastbit-ibis1.3.8/examples/.libs/ibis
Options summary: batch mode, log level 4, skipping estimation

fileManager initialization complete -- maxBytes=16112781312, maxOpenFiles=768
util::gatherParts -- examining MYfastbitIdx/PARTITION_NAME
Constructed a part named PARTITION_NAME
activeDir = "MYfastbitIdx/PARTITION_NAME"
part: PARTITION_NAME (MY_/fastbit-ibis1.3.8/examples/.libs/ardea -d 
MY_/buildIndex/0/fastbitIdx/PARTITION_NAME -M 
MY_BuildIndex/dim_persona_label_export_data_type.txt -t 0.csv) with 
A-REAL-COUNT rows, 108 columns
Column list:
###### I REMOVED THEM ######

Partition[1]:
  PARTITION_NAME
Query[1]:
  select count(1) WHERE COL_A contains "VALUE_A" and COL_B=0

doQuaere -- processing "SELECT count(1) WHERE COL_A contains "VALUE_A" and 
COL_B=0"
Constructing selectClause @ 0x7fff4926de20
Freeing selectClause @ 0x7fff4926de20
Constructing selectClause @ 0x7fff4926e0c0
selectClause::fillNames -- select clause internal details:
 low-level expressions (names_[1], aggr_[1], atms_[1]):
  0:    __0,    COUNT(*)
 high-level expressions (xnames_[1], xtms_[1]):
  0:    _0,     COUNT(*)
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- start 
timer ...
countQuery::setWhereClause accepted new query conditions "(0x6554d0 AND 
0x6566b0)"
Constructed in-memory data partition _xlYZ0 -- filter::sift2(SELECT count(1) 
FROM 1 data partition WHERE (0x644 ...) -- with 1 column
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- 
processing a select clause with 1 term, 0 of which is plain
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- 
processing data partition PARTITION_NAME
countQuery assigned data partition PARTITION_NAME

countQuery::evaluate -- start timer ...
index::create -- attempt to read index type #21 from 
MYfastbitIdx/PARTITION_NAME/COL_B.idx for column PARTITION_NAME.COL_B
direkte[COL_B]::read(0x657030) finished reading index header with 
nrows=A-REAL-COUNT and bits.size()=3
index::create(PARTITION_NAME.COL_B) reading the existing index took 7.84341e-05 
sec
index::create(PARTITION_NAME.COL_B) -- the direct index for column 
PARTITION_NAME.COL_B was read from MYfastbitIdx/PARTITION_NAME in 0 sec(CPU), 
0.000216035 sec(elapsed)
The direct bitmap index for COL_B contains 3 bit vectors

###### I removed THEM ######


index::create -- attempt to read index type #18 from 
MYfastbitIdx/PARTITION_NAME/COL_A.idx for column PARTITION_NAME.COL_A
dictionary::read(MYfastbitIdx/PARTITION_NAME/COL_A.terms) -- duration: 0 
sec(CPU), 0.000135055 sec(elapsed)
index::create(PARTITION_NAME.COL_A) reading the existing index took 0.000239864 
sec
index::create(PARTITION_NAME.COL_A) -- the keywords index for column 
PARTITION_NAME.COL_A was read from MYfastbitIdx/PARTITION_NAME in 0 sec(CPU), 
0.000316342 sec(elapsed)
The keywords index for column PARTITION_NAME.COL_A contains 1580 terms 
(optionally followed by term frequencies)
 (printing 1 out of every 50)
###### I REMOVED THEM ######


countQuery::doEvaluate(0x6554d0: COL_A CONTAINS 'VALUE_A', 
mask.cnt()=A_READ_COUNT2) --> 20449348, ierr = 1
column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(20449348, 
A-REAL-COUNT)) completed with ierr = 2
countQuery::doEvaluate(0x6566b0: COL_B == 0, mask.cnt()=20449348) --> 8468024, 
ierr = 2
countQuery::doEvaluate(0x65b310: (0x6554d0 AND 0x6566b0), 
mask.cnt()=A_READ_COUNT2) --> 8468024, ierr = 2
countQuery::evaluate -- Select count(*) From PARTITION_NAME Where (0x6554d0 AND 
0x6566b0) --> 8468024
countQuery::evaluate -- duration: 0.060003 sec(CPU), 0.0612946 sec(elapsed)
bord[_xlYZ0]::append added 8468024 rows to make a total of 8468024
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) creates 
an in-memory data partition with 8468024 rows and 1 column
bord::groupbya -- processing aggregations for "Select COUNT(*) From _xlYZ0"

Constructed in-memory data partition JSGw02 -- Select COUNT(*) From _xlYZ0 -- 
with 1 row and 1 column
table::freeBuffers to free buf[1] and typ[1]
bord::groupby -- completed Table (in memory) JSGw02 (Select COUNT(*) From 
_xlYZ0) contsists of 1 column and 1 row
_0      UINT    COUNT(*)

filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) produces 
an in-memory data partition with 1 row and 1 column
clearing data partition _xlYZ0
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x644 ...) -- 
duration: 0.104006 sec(CPU), 0.105114 sec(elapsed)
Freeing selectClause @ 0x7fff4926e0c0
doQuaere -- "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" 
produced a table with 1 row and 1 column, took 0.108006 CPU seconds, 0.108806 
elapsed seconds
doQuaere -- the result table (1 x 1) for "SELECT count(1) WHERE COL_A contains 
"VALUE_A" and COL_B=0"
8468024

clearing data partition JSGw02
MY_/fastbit-ibis1.3.8/examples/.libs/ibis -- total CPU time 0.112006 s, total 
elapsed time 0.113214 s
clearing data partition PARTITION_NAME
Cleaning up the file manager
Total pages accessed through read(unistd.h) is estimated to be 24
fileManager::clear -- starting ...
--- Fri Jun  5 07:02:11 2015
The number of memory mapped files is 1. (max = 768)
file name: MYfastbitIdx/PARTITION_NAME/COL_A.idx
storage @ 0x657350, 0x7f4dca7ff000, 1st 32 bits = 49424923, 1st 64 bits = 
4125349424923
file descriptor 3
fmap size       1897828708
base address    0x7f4dca7ff000
mapped          y       opened at       Fri Jun  5 07:02:11 2015        last 
used at    Fri Jun  5 07:02:11 2015
# of bytes      1897828708      # of past acc   4       # of active acc 0
Size of all mapped files is 1,897,828,708

The number of files read into memory is 2
file name: MYfastbitIdx/PARTITION_NAME/COL_B.idx
storage @ 0x657030, 0x65b490, 1st 32 bits = 49424923, 1st 64 bits = 
4155349424923

storage @ 0x657030, 0x65b490, 1st 32 bits = 49424923, 1st 64 bits = 
4155349424923
mapped          n       opened at       Fri Jun  5 07:02:11 2015        last 
used at    Fri Jun  5 07:02:11 2015
# of bytes      92300   # of past acc   8       # of active acc 0
file name: MYfastbitIdx/PARTITION_NAME/COL_B.msk
storage @ 0x656990, 0x655a40, 1st 32 bits = c0005bdf, 1st 64 bits = 
7ffbffffc0005bdf
mapped          n       opened at       Fri Jun  5 07:02:11 2015        last 
used at    Fri Jun  5 07:02:11 2015
# of bytes      60      # of past acc   2       # of active acc 0

The total size of all files read into memory is 92,360
Size of all named storages is 1,897,921,068
Size of all unnamed storages is 0
The total size of all named and unnamed storages is 1,897,921,068
The prescribed maximum size is 16,112,781,312
Number of pages accessed (recorded so far) is 24 (page size = 4096)


fileManager::clear -- completed with 0 byte of storage remain in memory after 
removing all managed objects

--- Fri Jun  5 07:02:11 2015
The number of memory mapped files is 0. (max = 768)
Size of all mapped files is 0

The number of files read into memory is 0

The total size of all files read into memory is 0
Size of all named storages is 0
Size of all unnamed storages is 0
The total size of all named and unnamed storages is 0
The prescribed maximum size is 16,112,781,312
Number of pages accessed (recorded so far) is 24 (page size = 4096)


Report from getrusage: maxrss = 57044 pages (4096 bytes/page), majflt = 1, 
minflt = 20020, inblock = 50544, outblock = 48
fileManager decommissioned
FastBit 2.0.2
Log started on Fri Jun  5 07:00:04 2015

MY/install-fastbit-2.0.2/bin/ibis
Options summary: batch mode, log level 4, skipping estimation

  current data limit: 18446744073709551615 (soft), 18446744073709551615 (hard)
fileManager initialization complete -- maxBytes=16112781312, maxOpenFiles=768
util::gatherParts -- examining MY/fastbitIdx/PARTITION_NAME
Constructed a part named PARTITION_NAME
activeDir = "MY/fastbitIdx/PARTITION_NAME"
part: PARTITION_NAME (MY/fastbit-ibis1.3.8/examples/.libs/ardea -d 
MY/buildIndex/0/fastbitIdx/PARTITION_NAME -M 
MY_BuildIndex/dim_persona_label_export_data_type.txt -t 0.csv) with 
A_REAL_COUNT rows, 108 columns
Column list:

###### I REMOVED the column list ######

Partition[1]:
  PARTITION_NAME
Query[1]:
  select count(1) WHERE COL_A contains "VALUE_A" and COL_B=0

doQuaere -- processing "SELECT count(1) WHERE COL_A contains "VALUE_A" and 
COL_B=0"
Constructing selectClause @ 0x7ffffb631960
Freeing selectClause @ 0x7ffffb631960
Constructing selectClause @ 0x7ffffb631ac0
selectClause::fillNames -- select clause internal details:
 low-level expressions (names_[1], aggr_[1], atms_[1]):
  0:    __0,    COUNT(*)
 high-level expressions (xnames_[1], xtms_[1]):
  0:    __0,    COUNT(*)
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- start 
timer ...
countQuery::setWhereClause accepted new query conditions "(0x654a50 AND 
0x642330)"
Constructed in-memory data partition _xCYZ1 -- filter::sift2(SELECT count(1) 
FROM 1 data partition WHERE (0x654 ...) -- with 1 column
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- 
processing a select clause with 1 term, 0 of which is plain

filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- 
processing data partition PARTITION_NAME
countQuery assigned data partition PARTITION_NAME
countQuery::evaluate -- start timer ...
index::create -- attempt to read index type #21 from 
MY/fastbitIdx/PARTITION_NAME/COL_B.idx for column PARTITION_NAME.COL_B
index::ctor reconstituted an index for PARTITION_NAME.COL_B from storage object 
@ 0x655890
direkte[COL_B]::read(0x655890) finished reading index header with 
nrows=A_REAL_COUNT and bits.size()=3
index::create(PARTITION_NAME.COL_B) reading the existing index took 9.42145e-05 
sec
index::create -- attempt to read index type #18 from 
MY/fastbitIdx/PARTITION_NAME/COL_A.idx for column PARTITION_NAME.COL_A
index::ctor reconstituted an index for PARTITION_NAME.COL_A from storage object 
@ 0x66c4d0
dictionary::read(MY/fastbitIdx/PARTITION_NAME/COL_A.terms) -- duration: 0 
sec(CPU), 0.000521151 sec(elapsed)
index::create(PARTITION_NAME.COL_A) reading the existing index took 0.000638694 
sec
text[PARTITION_NAME.COL_A]::keywordSearch(VALUE_A) -- duration: 0.020001 
sec(CPU), 0.022487 sec(elapsed)
countQuery::doEvaluate(COL_A CONTAINS 'VALUE_A', mask.cnt()= A_REAL_COUNT2) --> 
A_REAL_COUNT3, ierr = 1
column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, 
A_REAL_COUNT)) -- estimated cost with index = 45880, with sequential scan = 
2.83044e+08
column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, 
A_REAL_COUNT)) completed with ierr = 2
column[PARTITION_NAME.COL_B]::evaluateRange(COL_B == 0, mask(A_REAL_COUNT3, 
A_REAL_COUNT)) -- duration: 0.240015 sec(CPU), 0.244723 sec(elapsed)
countQuery::doEvaluate(COL_B == 0, mask.cnt()=A_REAL_COUNT3) --> 8468024, ierr 
= 2
countQuery::doEvaluate((0x654a50 AND 0x642330), mask.cnt()= A_REAL_COUNT2) --> 
8468024, ierr = 2
From PARTITION_NAME Where (0x654a50 AND 0x642330) --> 8468024
countQuery::evaluate -- duration: 0.288018 sec(CPU), 0.290324 sec(elapsed)
bord[_xCYZ1]::append added 8468024 rows to make a total of 8468024
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) creates 
an in-memory data partition with 8468024 rows and 1 column
bord::groupbya -- processing aggregations for "Select COUNT(*) From _xCYZ1"
Constructed in-memory data partition zxGI01 -- Select COUNT(*) From _xCYZ1 -- 
with 1 row and 1 column
table::freeBuffers to free buf[1] and typ[1]
bord::groupby -- completed Table (in memory) zxGI01 (Select COUNT(*) From 
_xCYZ1) contsists of 1 column and 1 row
__0     UINT    COUNT(*)

filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) produces 
an in-memory data partition with 1 row and 1 column
clearing data partition _xCYZ1
filter::sift2(SELECT count(1) FROM 1 data partition WHERE (0x654 ...) -- 
duration: 0.332021 sec(CPU), 0.332519 sec(elapsed)
Freeing selectClause @ 0x7ffffb631ac0
doQuaere -- "SELECT count(1) WHERE COL_A contains "VALUE_A" and COL_B=0" 
produced a table with 1 row and 1 column, took 0.336021 CPU seconds, 0.336029 
elapsed seconds
doQuaere -- the result table (1 x 1) for "SELECT count(1) WHERE COL_A contains 
"VALUE_A" and COL_B=0"
8468024

clearing data partition zxGI01
MY/install-fastbit-2.0.2/bin/ibis -- total CPU time 0.340021 s, total elapsed 
time 0.340736 s
clearing data partition PARTITION_NAME
Cleaning up the file manager
Total pages accessed through read(unistd.h) is estimated to be 24
fileManager::clear -- starting ...
--- Fri Jun  5 07:00:05 2015
The number of memory mapped files is 1. (max = 768)
file name: MY/fastbitIdx/PARTITION_NAME/COL_A.idx
storage @ 0x66c4d0, 0x7fd2f984c000, 1st 32 bits = 49424923, 1st 64 bits = 
4125349424923
file descriptor 3
fmap size       1897828708
base address    0x7fd2f984c000
mapped          y       opened at       Fri Jun  5 07:00:04 2015        last 
used at    Fri Jun  5 07:00:04 2015
# of bytes      1897828708      # of past acc   4       # of active acc 0
Size of all mapped files is 1,897,828,708

The number of files read into memory is 2
file name: MY/fastbitIdx/PARTITION_NAME/COL_B.idx
storage @ 0x655890, 0x6558f0, 1st 32 bits = 49424923, 1st 64 bits = 
4155349424923

file descriptor 3
fmap size       1897828708
base address    0x7fd2f984c000
mapped          y       opened at       Fri Jun  5 07:00:04 2015        last 
used at    Fri Jun  5 07:00:04 2015
# of bytes      1897828708      # of past acc   4       # of active acc 0
Size of all mapped files is 1,897,828,708

The number of files read into memory is 2
file name: MY/fastbitIdx/PARTITION_NAME/COL_B.idx
storage @ 0x655890, 0x6558f0, 1st 32 bits = 49424923, 1st 64 bits = 
4155349424923
mapped          n       opened at       Fri Jun  5 07:00:04 2015        last 
used at    Fri Jun  5 07:00:04 2015
# of bytes      92300   # of past acc   8       # of active acc 0
file name: MY/fastbitIdx/PARTITION_NAME/COL_B.msk
storage @ 0x655060, 0x654fb0, 1st 32 bits = c0005bdf, 1st 64 bits = 
7ffbffffc0005bdf
mapped          n       opened at       Fri Jun  5 07:00:04 2015        last 
used at    Fri Jun  5 07:00:04 2015
# of bytes      60      # of past acc   2       # of active acc 0

The total size of all files read into memory is 92,360
Size of all named storages is 1,897,921,068
Size of all unnamed storages is 0
The total size of all named and unnamed storages is 1,897,921,068
The prescribed maximum size is 16,112,781,312
Number of pages accessed (recorded so far) is 24 (page size = 4096)


fileManager::clear -- completed with 0 byte of storage remain in memory after 
removing all managed objects

--- Fri Jun  5 07:00:05 2015
The number of memory mapped files is 0. (max = 768)
Size of all mapped files is 0


The total size of all files read into memory is 0
Size of all named storages is 0
Size of all unnamed storages is 0
The total size of all named and unnamed storages is 0
The prescribed maximum size is 16,112,781,312
Number of pages accessed (recorded so far) is 24 (page size = 4096)


Report from getrusage: maxrss = 55924 pages (4096 bytes/page), majflt = 1, 
minflt = 16409, inblock = 52016, outblock = 40
fileManager decommissioned







_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to