Re: "Advanced" query language

Wolfgang Hoschek Sun, 18 Dec 2005 00:04:56 -0800

On Dec 17, 2005, at 2:36 PM, Paul Elschot wrote:

Gentlemen,


While maintaining my bookmarks I ran into this:
"Case Study: Enabling Low-Cost XML-Aware Searching
Capable of Complex Querying":

http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html


Some loose thoughts:

In the system described there a Lucene document is used for each

low level xml construct, even when it contains very few charactersof text.

The resulting Lucene indexes are at least 2.5 times the size of the

original document, which is not a surprise given this documentstructure.

Normal index size is about one third of  the indexed text.

I don't know about the XQuery standard, but I was wondering
whether this unusual document structure and the non straightforward
fit between Lucene queries and XQuery queries are related.

Seems that a lot of metadata beyond the actual text is stored. Forexample, node type, ancestors, parent, number of children, etc., foreach element and attribute. If the fulltext is relatively small, asis often the case in quite structured XML such as the shakespearecollection, that should significantly increase storage space.


For example, romeo and juliet goes along the following lines:

<SPEECH>
<SPEAKER>FRIAR LAURENCE</SPEAKER>
<LINE>Not in a grave,</LINE>
<LINE>To lay one in, another out to have.</LINE>
</SPEECH>

<SPEECH>
<SPEAKER>ROMEO</SPEAKER>
<LINE>I pray thee, chide not; she whom I love now</LINE>
<LINE>Doth grace for grace and love for love allow;</LINE>
<LINE>The other did not so.</LINE>
</SPEECH>

<SPEECH>
<SPEAKER>FRIAR LAURENCE</SPEAKER>
<LINE>O, she knew well</LINE>
<LINE>Thy love did read by rote and could not spell.</LINE>
<LINE>But come, young waverer, come, go with me,</LINE>
<LINE>In one respect I'll thy assistant be;</LINE>
<LINE>For this alliance may so happy prove,</LINE>
<LINE>To turn your households' rancour to pure love.</LINE>
</SPEECH>


As for the  joines and iterations over items from the stream of XML
results: iteration over matching XML constructs should be no problem
in Lucene. Joins in Lucene are normally done via boolean filters,
so I was wondering how XQuery joins fit these.

Similar as in SQL. The engine constructs a locial execution plan forthe query, and rewrites it into an optimized physical plan as deemedappropriate, perhaps guided by statistics, using a nested loop, hashjoin, or any other more sophisticated strategy.


Wolfgang.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

Reply via email to