The ever growing presence of mingled structured and unstructured data is a fact of life and modern systems we have to deal with. Clearly, the tendency is that full-text indexing is moving towards DB functionality, i.e. <attribute,value> fields for projection/filtering, sorting, faceted queries, transactional CRUD operations etc. Though set manipulation is not Lucene's or Solr's forte, the document-object model maps very well to rows of relational sets or tables, evermore when CLOBs and TEXT fields where introduced.
On the other hand, relational databases with XML and OO extensions and native XML repositories still have to deal with the problem of RANKING unstructured text and combination of text fragments and structured conditions, thus dealing no longer just with a set/relational model that yields binary answers but extending their query languages to handled the concept of fuzziness, relevance, etc. ( e.g. SQL/MM, XQuery-FullText). I would like once again to open this can of worms, and perhaps think out of the box, without classifying DB and Full-Text as simply different, as we analyze concepts to further understand the real path for evolution of Lucene/Sorl Here is a very interesting attempt to create a special type of "index" called Domain Index to query unstructured data within Oracle by Marcelo Ochoa: https://issues.apache.org/jira/browse/LUCENE-724 Other interesting articles: XQuery 1.0 - Full-Text: http://www.w3.org/TR/xquery-full-text/ SQL/MM Full-Text http://www.wiscorp.com/2CD1R1-02-fulltext-2001-12.pdf Discussions on *XML data model vs. relational model* http://www.xml.com/cs/user/view/cs_msg/2645 http://www.w3.org/TR/xpath-datamodel/ http://en.wikipedia.org/wiki/Relational_model -- Joaquin Delgado
