Re: "Advanced" query language

mark harwood Tue, 20 Dec 2005 01:25:24 -0800

>However the moment you are promoting INTEROPERABILITY
with other 
>search/retrieval systems by XMLizing the query input
and the >result output, like Mark is, then it makes
sense to adhere to >standards


I think this is hijacking my original intentions to
some extent. I may be accused of being short-sighted
but I wasn't proposing a language for interoperability
with other search systems or query standards. That
approach suggests a constraining "lowest common
denominator" effect which is at odds with my original
intentions.

What I was looking for was simply a way to fill the
gap between the current QueryParser syntax and the
growing list of powerful Lucene features that can't be
represented in it's syntax (spans, regexes,
FilteredQuery, LikeThis....).

I outlined why a String representation of queries was
desirable in my original post (fundamentally:
persistence, distribution and language independence). 

My use of XML was intended to meet the above
objectives and give FULL coverage of all Lucene
features. Search system interoperability wasn't on my
list and (correct me if I'm wrong here) adding it
would preclude some of the more exotic Lucene features
eg "MoreLikeThis" or "BoostingQuery".


I'm all for standards/interoperability in the new
query language if it:
a) Doesn't become a nightmare to implement
b) Allows all of the Lucene query functionality to be
exposed
c) Is a real requirement for enough Lucene users

I'm just not sure that any/all of these conditions are
true.

Maybe there needs to be a separate "interoperability"
language development?

Cheers
Mark














--- Joaquin Delgado <[EMAIL PROTECTED]>
wrote:

> Comments in-line
> 
> Wolfgang Hoschek wrote:
> 
> > Yes, there are interesting impls out there. I've
> myself implemented  
> > XQuery fulltext search via extension functions
> build on Lucene. See  
> >
>
http://dsd.lbl.gov/nux/index.html#Google-like%20realtime%20fulltext%
> 
> > 20search%20via%20Apache%20Lucene%20engine
> >
> > However, rather than targetting fulltext search of
> infrequent queries  
> > over huge persistent data archives (historic
> search), Nux targets  
> > streaming fulltext search of huge numbers of
> queries over  
> > comparatively small transient realtime data
> (prospective search),  
> > e.g. 100000 queries/sec ballpark. Think XML
> router. That's probably  
> > distinctly different than what many (most?) other
> folks would like to  
> > do, and requires a different, somewhat
> non-standard, architecture.
> >
> > [The underlying lucene code lives in lucene SVN in
> the lucene/contrib/ 
> > memory module, the remainder is in Nux.]
> >
> > Implementing XQuery in full compliance with the
> spec is a rather  
> > gigantic undertaking. Separating the XQuery
> language and the fulltext  
> > language greatly simplified the system design, and
> made it more  
> > flexible and extensible.
> 
> [JOAQUIN] One of the arguable advantage of this new
> XQuery FT draft is 
> that the semantics
>
(http://www.w3.org/TR/xquery-full-text/#tq-semantics)
> 
> are defined using XQuery  functions, thus it is
> relatively easy to build 
> a "dumb" XQuery-FT compliant engine using these
> definitions :-)  Here is 
> a Java based XQuery engine developed in Cornell that
> satisfies most of 
> the working draft's requirements:
>
http://www.cs.cornell.edu/database/Quark/quark_main.html
> 
> > Further, consider that tulltext search
> capabilities are typically  
> > quite open ended and context/application specific.
> Seems to me that  
> > that's one of the reasons why lucene is more a set
> of interfaces and  
> > diverse building blocks than a complete end user
> system. I find it  
> > difficult to believe that making the fulltext
> language an *integral  
> > part of XQuery* will enable sufficient "extension
> points" to prove  
> > meaningful to end users and implementors.
> Standards evolve at a  
> > glacial pace; it effectively means that most or
> all flexibility is  
> > lost. I tend to think that the W3C is jumping the
> gun and attempting  
> > to standardize what is more an R&D concept than a
> well understood set  
> > of capabilities across a wide range of actual real
> world use cases,  
> > and it does so in a non-modular manner.
> 
> Full-text search remains open ended and context/app
> specific thus it 
> makes sense to leave Lucene as is and still have,
> for example Nutch. 
> However the moment you are promoting
> INTEROPERABILITY with other 
> search/retrieval systems by XMLizing the query input
> and the result 
> output, like Mark is, then it makes sense to adhere
> to standards and the 
> standard to query XML is XQuery. Because of the
> nature of the data (XML) 
> full-text becomes a *must* requirement of the
> standard. If Mark comes up 
> with yet another query language with some custom
> tags it would be 
> denying the fact that search systems need to
> communicate among them and 
> thus re-inventing the wheel. Besides, almost 80% of
> all full-text 
> operators (Boolean, wildcards, proximity, etc.) just
> differ in syntax 
> from one search engine to another. Just look at
> another "Common Query 
> Language" now being used by the Library of Congress 
> (http://www.loc.gov/standards/sru/cql/) for
> federated search.
> 
> Maybe I'm being too ambitious here but if we have an
> implementation of 
> XQuery-FT compliant XQuery engine on top of Lucene
> indices or at the 
> minimum _Lucene could interpret XPath queries_ where
> element node labels 
> are  equivalent to Lucene fields we begin thinking
> of exposing Lucene 
> sources to more sophisticated and distributed XQuery
> engines, thus 
> providing full XML support on any Lucene based
> system. Unfortunately 
> Lucene does not support nested fields but that is OK
> for now.
> 
> -- Joaquin
> 
> >
> > On Dec 17, 2005, at 5:43 PM,
> [EMAIL PROTECTED] wrote:
> >
> >> Paul and  Wolfang,
> >>
> >> Thank you very much for your input. I think there
> are two distinct  
> >> problems that have emerged from this thread:
> >> 1) The ability to create efficient structures to
> index and query  XML 
> >> documents (element, attributes and corresponding
> values) with a  
> >> full-text query language and perforators. After
> all XML is text. As  
> >> Paul pointed out people have already tried this
> with Lucene.
> >> 2) The need for a standard query language like
> XQuery aiming at  
> >> system interoperability in the now XMLized world
> that has the same  
> >> effect that SQL had in the relational world.
> >>
> >> While I can see how in the SQL case extension
> functions can be used  
> >> to implement full-text capabilities, in the XML
> case full-text is  
> >> required to query and retrieve XML (sub-document)
> elements and  
> >> attributes  based on the free text (natural
> language) values AND  
> >> also to query the strings that represent the
> structure itself. For  
> >> example, in simple SQL queries the names of the
> tables and columns  
> >> need to be known to project corresponding values
> and are not part  of 
> >> the search conditions (in WHERE clauses only
> values  corresponding to 
> >> table/columns are evaluated).
> >>
> >> In XQuery both the structure and the content are
> searchable, thus  
> >> requiring full-text operators. That is why XQuery
> Full-Text  requires 
> >> the unification and standardization both XQuery
> and Full- Text 
> >> "languages". Needless is to say that the
> implementation will  differ 
> >> from system to system.
> >>
> >> I do agree though that the abstraction of
> full-text capabilities  
> >> through functional extensions is a great first
> step. Check out  
> >> Oracle's XML Query Service
> (http://www.oracle.com/technology/tech/ 
> >> xml/xds/index.html and ,
> http://www.oracle.com/technology/oramag/ 
> >> oracle/05-mar/o25xml.html)  a Java based XQuery
> engine that has  
> >> abstracted "data sources"  such as Web Services,
> RDBMS, etc. as  
> >> functions that while returning XML can receive
> parameters and  supply 
> >> full-text capabilities. If Mark's implementation
> of Lucene  query and 
> >> output in XML comes to fruition a Lucene data
> source will  become yet 
> >> another stream of XML that can be queried,
> processed and  rendered by 
> >> the mid-tier XQuery engine.
> >>
> >> -- Joaquin
> >>
> >>
> >>
> >> While maintaining my bookmarks I ran into this:
> 
=== message truncated ===



                
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

Reply via email to