Hi, Teryl, FastBit supports strings in two different ways, as categorical values (keys) or text. For categorical values, the string in each row is treated as an atomic value. FastBit will build a dictionary for the values and build an index. Let's say the column name is 'mykey', in the data directory, you should see, four files named as follows
mykey, mykey.sp, mykey.dic, mykey.idx. They are the original string values (in raw binary with nil terminators), the starting positions of the strings, the dictionary, and the index. For text values, by default, no index is built. The only index that is supported is a keyword index. In this case, the text in each row is broken into words and the index records the appearances of each word (also called keyword) in any row. To tell FastBit to build a keyword index, modify the indexing specification of the specific column in the metadata file -part.txt to say something like index=keywords in the context of a column, the specification of the column in a -part.txt file might look like the following Begin Column name = mytext description = a free-form text describing the column content type = TEXT index = keywords End Column Please feel free to let us know if you have more questions. John On 7/28/11 12:14 PM, Teryl Taylor wrote: > Hi John, > > I've created a column in my database that is a column of strings - > high cardinality (500,000 to 1,000,000). I call buildIndexes on the > partition, and it seems to build indices for all columns except for > the string column. Are string indices not supported? Or do I have to > do something special to get them to generate? > > > Best Regards, > > Teryl > > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
