Hi, Justo,

Thanks for you interest in FastBit.  The keyword index from FastBit is 
not easy to use and we are really appreciate the effort you've made in 
try it.  We will study your test case carefully in the next couple 
days and get back to you as soon as we have some useful to say.

Regards,
John


On 12/2/2010 1:47 PM, Justo Ruiz Ferrer wrote:
> Hi,
>
> I’m trying to make use of keyword indexes and I am finding that the
> results are not coherent, despite the fact no errors were reported
> whilst indexing the columns. It could be that I am not using correctly
> the tools, but I’ve checked and rechecked; however, I may have
> constructed the data files incorrectly; Hope is not the case J
>
> I’ve generated a synthetic load (see attached data.zip) which
> simulates a measure (column creditrisk) against a fictitious trade
> (tradeid column); rows have a identifier called rowid (integer) which
> is used to build the keyword index later on (see riskkeys column and
> attached zip file for the tdlist).
>
> In total there are 20000 rows of randomly generated data. There are
> three columns of interest, rsk3, rsk2 and rsk1, which are the values
> of fake taxonomy of three levels representing risk classifications.
> The fake taxonomy looks as follows:
>
> rsk3 rsk2 rsk1
>
> A Strong Good
>
> A- Strong Good
>
> A+ Strong Good
>
> AA Very Strong Good
>
> AA- Very Strong Good
>
> AA+ Very Strong Good
>
> AAA Extremely Strong Good
>
> B More Vulnerable Not so good
>
> B- More Vulnerable Not so good
>
> B+ More Vulnerable Not so good
>
> BB Less Vulnerable Not so good
>
> BB- Less Vulnerable Not so good
>
> BB+ Less Vulnerable Not so good
>
> BBB Adequate FiftyFifty
>
> BBB- Adequate FiftyFifty
>
> BBB+ Adequate FiftyFifty
>
> C Currently Highly Vulnerable Run Away
>
> CC Currently Highly Vulnerable Run Away
>
> CCC Currently Vulnerable Run Away
>
> D Failed Run Away
>
> This taxonomy is later on “flatten out” by using a keyword index.
>
> After importing the data and indexing, queries over rsk3, rsk2 and
> rsk1 works as expected with all the counts adding up to 20000:
>
> *ibis > select rsk1 *
>
> rsk1 (with counts)
>
> "Not so good", 6066
>
> "Good", 7084
>
> "Run Away", 3926
>
> "FiftyFifty", 2924
>
> *ibis > select rsk2 *
>
> rsk2 (with counts)
>
> "Less Vulnerable", 3005
>
> "Extremely Strong", 1051
>
> "More Vulnerable", 3061
>
> "Currently Highly Vulnerable", 1963
>
> "Very Strong", 3026
>
> "Failed", 961
>
> "Strong", 3007
>
> "Currently Vulnerable", 1002
>
> "Adequate", 2924
>
> *ibis > select rsk3 *
>
> rsk3 (with counts)
>
> "BB", 981
>
> "AAA", 1051
>
> "B-", 969
>
> "BB-", 1017
>
> "C", 1004
>
> "B", 1067
>
> "B+", 1025
>
> "CC", 959
>
> "AA+", 994
>
> "D", 961
>
> "A+", 1017
>
> "AA", 1018
>
> "AA-", 1014
>
> "CCC", 1002
>
> "BBB", 986
>
> "A", 1005
>
> "BB+", 1007
>
> "A-", 985
>
> "BBB+", 963
>
> "BBB-", 975
>
> The column *riskkeys *is the keyword index that associates every rowid
> with its three levels on the “risk” taxonomy. The indexing process
> doesn’t report any errors, but we see inconsistencies when running
> queries using riskkeys column in the where statement. For example:
>
> *ibis > select rsk1 where riskkeys="Run_Away" *
>
> rsk1 (with counts)
>
> "Run Away", 1005
>
> It should really report a count of 3926
>
> *ibis > select rsk2 where riskkeys="Run_Away" *
>
> rsk2 (with counts)
>
> "Currently Highly Vulnerable", 1004
>
> "Failed", 1
>
> It is missing “Currently Vulnerable” and the counts doesn’t match
>
> *ibis > select rsk3 where riskkeys="Run_Away" *
>
> rsk3 (with counts)
>
> "C", 1004
>
> "D", 1
>
> Missing CC and CCC and the count for D is wrong.
>
> Is there anything wrong (bug?) with keyword indexes? I am using head
> of code and winxp. Code was compiled with MS VS 2010 in release mode
>
> Thanks a lot for your help.
>
> Justo.
>
> Endelec LLP is a company registered in England and Wales.
> Registered number: OC356543
> Registered office: 30 City Road, London EC1Y 2AB
>
>
>
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to