[ http://issues.apache.org/jira/browse/DERBY-472?page=comments#action_12355900 ]
Rick Hillegas commented on DERBY-472: ------------------------------------- The following wiki page tracks this evolving proposal: http://wiki.apache.org/db-derby/LuceneIntegration. > Full Text Indexing / Full Text Search > ------------------------------------- > > Key: DERBY-472 > URL: http://issues.apache.org/jira/browse/DERBY-472 > Project: Derby > Type: New Feature > Components: SQL > Versions: 10.0.2.0 > Environment: All environments > Reporter: Rick Hillegas > > Efficiently support full text search of string datatyped columns. Mag Gam > raised this issue on the user's mailing list on 24 July 2005; the email > thread is titled 'Full Text Indexing'. Mag wants to see something akin to the > functionality in tsearch2 > (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/). Dan points out > that we may be able to re-use index building technology exposed by the apache > Lucene project (http://lucene.apache.org/). > Presumably we want to build inverted indexes on all string datatyped columns: > CHAR, VARCHAR, LONG VARCHAR, CLOB,, and their national variants (when they > are implemented). We should consider the following additional issues when > specifying this feature: > 1) Do we also want to support text search on XML columns? > 2) Which human languages do we support initially? Each language has its own > rules for lexing words and its own list of "noise" words which should not be > indexed. Hopefully, we can plug-in some existing packages of lexers and noise > filters. We should encourage users to donate additional lexers/fitlers. > 3) The CREATE INDEX syntax (for these new inverted indexes) should let us > bind a lexing human language to a string-datatyped column. > 4) How do we express the search condition? For case-sensitive searches we can > get away with boolean expressions built out of standard LIKE clauses. > However, in my opinion, case-sensitive searches are an edge case. The more > useful situation is a case-insensitive search. Can we get away with > introducing a non-standard function here or do we need to push a proposal > through the standards commitees? Even more useful and non-standard are fuzzy > searches, which tolerate bad spellers. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
