Hi there, Many thanks for your support. Sorry it has taken me a few days to respond; I had a quick run today with the new head of code and everything seems to be fine, but I need to dig in a little bit more on the functionality you have documented on your last email.
BTW, Please feel free to use the example taxonomy... some bits were taken directly for Wikipedia and the top level of the hierarchy has no business sense whatsoever!! However it is a good test case to quickly check consistency of the indexing by simply counting rows. Glad you find it interesting enough and I am thankful for your trouble putting a C++ test for the case. I should be able to give you more feed back in a few days, once my agenda has cleared out a little bit. Thanks again for your support!! Justo. -----Original Message----- From: Kesheng Wu [mailto:[email protected]] Sent: 08 December 2010 17:39 To: FastBit Users Cc: Justo Ruiz Ferrer Subject: Re: [FastBit-users] Keyword indexes Hi, Justo, A set of test functions have been added to FastBit's testing suite to exercise the keyword indexes based on the output jrf.cpp attached to the previous message. I presume we have your blessing in using the risk categories. If that is a problem, please let us know soon to we can replace the list with something else. A small set of functions have been added to extract keywords from text without the need of externally provided term-document list. This allows one to specify an indexing option of "keywords" without an explicit docidname. In this case, the new parser is used to generate the keyword index. This additional feature should make the keyword index more usable than before. In the case of your risk categories, because many keywords contain embedded space, an option to allow these keywords to be recognized is to place comas between the keywords. Such coma-separated-values format is frequently used and may be a reasonable option for your data. Hope this is useful for you. When you get a chance to test the new code, please let us know how it works for you. Thanks. John PS: You can check out the latest source code from SVN repository using the following command svn checkout https://codeforge.lbl.gov/anonscm/fastbit Endelec LLP is a company registered in England and Wales. Registered number: OC356543 Registered office: 30 City Road, London EC1Y 2AB _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
