Hi, Teryl,

FastBit supports strings in two different ways, as categorical values 
(keys) or text.  For categorical values, the string in each row is 
treated as an atomic value.  FastBit will build a dictionary for the 
values and build an index.  Let's say the column name is 'mykey', in 
the data directory, you should see, four files named as follows

mykey, mykey.sp, mykey.dic, mykey.idx.

They are the original string values (in raw binary with nil 
terminators), the starting positions of the strings, the dictionary, 
and the index.

For text values, by default, no index is built.  The only index that 
is supported is a keyword index.  In this case, the text in each row 
is broken into words and the index records the appearances of each 
word (also called keyword) in any row.  To tell FastBit to build a 
keyword index, modify the indexing specification of the specific 
column in the metadata file -part.txt to say something like

index=keywords

in the context of a column, the specification of the column in a 
-part.txt file might look like the following

Begin Column
name = mytext
description = a free-form text describing the column content
type = TEXT
index = keywords
End Column


Please feel free to let us know if you have more questions.

John




On 7/28/11 12:14 PM, Teryl Taylor wrote:
> Hi John,
>
> I've created a column in my database that is a column of strings -
> high cardinality (500,000 to 1,000,000).   I call buildIndexes on the
> partition, and it seems to build indices for all columns except for
> the string column.  Are string indices not supported?  Or do I have to
> do something special to get them to generate?
>
>
> Best Regards,
>
> Teryl
>
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to