We get this question one in a while, so here is a summary.

- TEXT is meant for free-form text fields, while CATEGORY is meant for
fields with values taken from a well-defined vocabulary.  For example,
TEXT might be for a custom comment field including arbitrary text.
CATEGORY might be for email addresses, first names, last names.

- The operator IN, as "names IN ("A", "AAA", ...)", can be expressed
as "names = 'A' or names = 'AAA' or ..."  A simple way to evaluate
this expression is to match the string literals against each of the
names.  The key thing to remember here is that the whole string of
names in a particular row has to match with on of the string
literals in order for the expression to be satisfied.  This operation
is much better supported by the CATEGORY column.

- FastBit only supports keyword index on TEXT columns.  It is meant
for google type of searches.  When you type a couple of words into the
google search box, your request to google is effectively "finding
documents containing the words I just typed"  In the context of
FastBit, each document is a row of a TEXT column.

In practice, the difference is this.  When you tell FastBit a column
is CATEGORY, it will attempt to read a dictionary associated with the
column.  If the dictionary is large, then the start up cost will be
high.  If you tell FastBit a particular column is TEXT, there won't
need to do much at the startup phase, but the time to process the
query of the form "names in (...)" would be much work.

Hope this helps.

John
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to