[
https://issues.apache.org/jira/browse/DERBY-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753211#action_12753211
]
Rick Hillegas commented on DERBY-472:
-------------------------------------
A little more information on standards which apply to this work. The ANSI/ISO
SQL standard does address full-text search in part 2 of the section titled
SQL/MM (SQL Multimedia and Application Packages). The approach centers on a
special full-text datatype, which is an ADT. The ADT has many methods defined
on it, some of which return BOOLEAN values. SQL/MM is part of the 1999 version
of the standard. I asked the ANSI SQL committee if anyone on the committee had
implemented the spec. No-one replied. I think that means that no major vendor
has implemented this part of the SQL spec.
You can get a copy of part 2 of SQL/MM here:
http://www.wiscorp.com/SQLStandards.html
At this time, I don't recommend building Derby text search capabilities on top
of the SQL standard--for these reasons:
1) I question the usefulness of a spec which has not been implemented by any
major vendor.
2) Derby does not currently support ADTs.
3) Derby does not currently support a BOOLEAN datatype.
> Full Text Indexing / Full Text Search
> -------------------------------------
>
> Key: DERBY-472
> URL: https://issues.apache.org/jira/browse/DERBY-472
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 10.0.2.0
> Environment: All environments
> Reporter: Rick Hillegas
>
> Efficiently support full text search of string datatyped columns. Mag Gam
> raised this issue on the user's mailing list on 24 July 2005; the email
> thread is titled 'Full Text Indexing'. Mag wants to see something akin to the
> functionality in tsearch2
> (http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/). Dan points out
> that we may be able to re-use index building technology exposed by the apache
> Lucene project (http://lucene.apache.org/).
> Presumably we want to build inverted indexes on all string datatyped columns:
> CHAR, VARCHAR, LONG VARCHAR, CLOB,, and their national variants (when they
> are implemented). We should consider the following additional issues when
> specifying this feature:
> 1) Do we also want to support text search on XML columns?
> 2) Which human languages do we support initially? Each language has its own
> rules for lexing words and its own list of "noise" words which should not be
> indexed. Hopefully, we can plug-in some existing packages of lexers and noise
> filters. We should encourage users to donate additional lexers/fitlers.
> 3) The CREATE INDEX syntax (for these new inverted indexes) should let us
> bind a lexing human language to a string-datatyped column.
> 4) How do we express the search condition? For case-sensitive searches we can
> get away with boolean expressions built out of standard LIKE clauses.
> However, in my opinion, case-sensitive searches are an edge case. The more
> useful situation is a case-insensitive search. Can we get away with
> introducing a non-standard function here or do we need to push a proposal
> through the standards commitees? Even more useful and non-standard are fuzzy
> searches, which tolerate bad spellers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.