Re: "Advanced" query language

JOAQUIN . DELGADO Sat, 17 Dec 2005 17:43:28 -0800

Paul and  Wolfang,

Thank you very much for your input. I think there are two distinct problems 
that have emerged from this thread:
1) The ability to create efficient structures to index and query XML documents 
(element, attributes and corresponding values) with a full-text query language 
and perforators. After all XML is text. As Paul pointed out people have already 
tried this with Lucene.
2) The need for a standard query language like XQuery aiming at system 
interoperability in the now XMLized world that has the same effect that SQL had 
in the relational world.


While I can see how in the SQL case extension functions can be used to 
implement full-text capabilities, in the XML case full-text is required to 
query and retrieve XML (sub-document) elements and attributes  based on the 
free text (natural language) values AND also to query the strings that 
represent the structure itself. For example, in simple SQL queries the names of 
the tables and columns need to be known to project corresponding values and are 
not part of the search conditions (in WHERE clauses only values corresponding 
to table/columns are evaluated). 

In XQuery both the structure and the content are searchable, thus requiring 
full-text operators. That is why XQuery Full-Text requires the unification and 
standardization both XQuery and Full-Text "languages". Needless is to say that 
the implementation will differ from system to system.

I do agree though that the abstraction of full-text capabilities through 
functional extensions is a great first step. Check out Oracle's XML Query 
Service (http://www.oracle.com/technology/tech/xml/xds/index.html and , 
http://www.oracle.com/technology/oramag/oracle/05-mar/o25xml.html)  a Java 
based XQuery engine that has abstracted "data sources"  such as Web Services, 
RDBMS, etc. as functions that while returning XML can receive parameters and 
supply full-text capabilities. If Mark's implementation of Lucene query and 
output in XML comes to fruition a Lucene data source will become yet another 
stream of XML that can be queried, processed and rendered by the mid-tier 
XQuery engine.

-- Joaquin

--- Begin Message ---

Gentlemen,

While maintaining my bookmarks I ran into this:
"Case Study: Enabling Low-Cost XML-Aware Searching
Capable of Complex Querying":
http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html

Some loose thoughts:

In the system described there a Lucene document is used for each
low level xml construct, even when it contains very few characters of text.
The resulting Lucene indexes are at least 2.5 times the size of the
original document, which is not a surprise given this document structure.
Normal index size is about one third of  the indexed text.

I don't know about the XQuery standard, but I was wondering
whether this unusual document structure and the non straightforward
fit between Lucene queries and XQuery queries are related.

As for the  joines and iterations over items from the stream of XML
results: iteration over matching XML constructs should be no problem
in Lucene. Joins in Lucene are normally done via boolean filters,
so I was wondering how XQuery joins fit these.
The case study above has a note a the end of par 5.3: 
"The Search Result list that comes back could then be organized
by document id to group together all the results for a single XML
document. This is not provided by default, but has been done with
extension to this code."

Regards,
Paul Elschot

On Friday 16 December 2005 03:45, Wolfgang Hoschek wrote:
> I think implementing an XQuery Full-Text engine is far beyond the  
> scope of Lucene.
> 
> Implementing a building block for the fulltext aspect of it would be  
> more manageable. Unfortunately The W3C fulltext drafts  
> indiscriminately mix and mingle two completely different languages  
> into a single language, without clear boundaries. That's why most  
> practical folks implement XQuery fulltext search via extension  
> functions rather than within XQuery itself. This also allows for much  
> more detailed tokenization, configuration and extensibility than what  
> would be possible with the W3C draft.
> 
> Wolfgang.
> 
> On Dec 15, 2005, at 4:20 PM, [EMAIL PROTECTED] wrote:
> 
> > Mark,
> >
> > This is very cool. When I was at TripleHop we did something very  
> > similar where both query and results conformed to an XML Schema and  
> > we used XML over HTTP as our main vehicle to do remote/federated  
> > searches with quick rendering with stylesheets.
> >
> > That however is the first piece of the puzzle. If you really want  
> > to go beyond search (in the traditional sense) and be able to  
> > perform more complex operations such as joines and iterations over  
> > items from the stream of XML results you are getting you should  
> > consider implementing an XQuery Full-Text engine with Lucene  
> > adopting the now standard XQuery language.
> >
> > Here is the pointer to the working draft on the W3C working draft  
> > on XQuery 1.0 and XPath 2.0 Full-Text:
> > http://www.w3.org/TR/xquery-full-text/
> >
> > Now I'm part of the task force editing this draft so your comments  
> > are very much welcomed.
> >
> > -- J.D.
> >
> >
> > http://www.inperspective.com/lucene/LXQueryV0_1.zip
> >
> > I've implemented just a few queries (Boolean, Term, FilteredQuery,
> > BoostingQuery ...) but other queries are fairly trivial to add.
> > At this stage I am more interested in feedback on parser design/ 
> > approach
> > rather than trying to achieve complete coverage of all the Lucene  
> > Query
> > types or debating the choice of tag names.
> >
> > Please see the readme.txt in the package for more details.
> >
> > Cheers
> > Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--- End Message ---

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

Reply via email to