Bugs item #2144639, was opened at 2008-10-03 18:25
Message generated for change (Settings changed) made by lsidir
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2144639&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF general
Group: Pathfinder "stable"
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Loredana Afanasiev (lafanasi)
Assigned to: Lefteris Sidirourgos (lsidir)
Summary: XQ: fn:collection() in algebra version
Initial Comment:
Hi Lefteris,
as discussed on Thu, I add this as a bug, so that you can close it soon :)
[EMAIL PROTECTED] xq]$ mclient -lx -s 'fn:collection("MotiesTweedeKamer")'
MAPI = [EMAIL PROTECTED]:50000
QUERY = fn:collection("MotiesTweedeKamer")
ERROR = !fatal error: Algebra implementation for function `fn:collection'
unknown.
Thanks,
l.
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-11-14 12:22
Message:
closing this bug since the nightly testing was successful
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-11-13 16:11
Message:
Hi,
I checked in the development branch a fix for pf:collection. Now,
pf:collection uses the doc_tbl operator for better and more correct
optimization, as Jan suggested. Also fn:collection is now implemented as
pf:collection and a child-step. The checkin is tagged with
fn_collection_before and fn_collection_after for easier back porting to the
stable branch if Sjoerd chooses to do so.
I changed the resolution to fixed, but not closing the bug yet. Loredana
will do some test of her own and also check the new execution times, and I
will also wait for the nightly testing.
lefteris
----------------------------------------------------------------------
Comment By: Loredana Afanasiev (lafanasi)
Date: 2008-11-03 17:43
Message:
Hi Jan,
I reinstalled and run the q9 queries again. Now the performance time is
around 4644.881 msec for both algebra (pf:collection()) and MPS
(fn:collection()) versions. Thanks for the fix!!!
> For pf:collection and fn:collection probably a new algebra operator
like
> the doc_tbl operator is needed as some rewrite rules rely on the
fragment
> information. Perhaps a variant of fn:doc would do the job...
I dont' follow this fully, so I take it, this is not for my ears. :)
l.
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2008-11-02 11:35
Message:
Hi Loredana,
I analyzed the query and it turned out that the incorrect implementation
of pf:collection() makes an important rewrite rule to fail. I checked in a
fix that makes the rewrite less restrictive and thus leads to a plan where
the performance should be acceptable.
For pf:collection and fn:collection probably a new algebra operator like
the doc_tbl operator is needed as some rewrite rules rely on the fragment
information. Perhaps a variant of fn:doc would do the job...
----------------------------------------------------------------------
Comment By: Loredana Afanasiev (lafanasi)
Date: 2008-11-01 20:03
Message:
Hi Jan,
sorry for taking so long..
> pf:collection(...)/*
of course! "*" doesn't cover document nodes. thx!
> performance difference between alg and MPS version
I tried compiling with pf
($pf q9-pf.xq)
but I didn't get any warning message. If you know a way to make this query
run faster on the alg version, please let me know.
thanks!
l.
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2008-10-08 23:41
Message:
Answer to 'why does count(pf:collection(...)/*) return 0':
I don't know the behavior of pf:collection but I assume it returns a
magical node that sits on top of all root nodes in the collection. If the
collection like in Loredanas case only consists of documents the children
of the magical node naturally are all document nodes. Thus the element
tagname test * does not return any results.
Answer to 'why is there a performance difference between algebra and MPS
version':
As always if there is a big difference a join is not detected in the one
or other variant. In the stable branch we had to avoid some error message
(about running out of column names) and thus are not able to apply all
optimizations necessary to detect the value-based join. If you use pf to
compile the query you'll see a warning that some optimizations could not be
applied.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-08 16:31
Message:
and here the "proof" that pf:collection() does not cause any performance
degradation --- the problem is indeed the ALG translation ...
$ cat q9-fn.xq
let $col := fn:collection("MotiesTweedeKamer")
for $y in distinct-values( for $y in $col//hiddendatum return
substring-before(fn:string($y),'.') )
let $thisyear :=
$col//document[substring-before(fn:string(.//hiddendatum[1]),'.')=$y]
let $partij := distinct-values($thisyear//partij)
for $p in $partij
let $aantalingediendemoties :=
count($thisyear[.//indienergnlod//partij=$p])
let $aantalmedeingediendemoties :=
count($thisyear[.//medeindienergnlod//partij=$p])
order by $y descending, $aantalingediendemoties
descending, $aantalmedeingediendemoties descending
return
<aantal jaar='{$y}'
partij='{$p}'
aantalingediendemoties='{$aantalingediendemoties}'
aantalmedeingediendemoties='{$aantalmedeingediendemoties}'
/>
$ diff q9-{fn,pf}.xq
1c1
< let $col := fn:collection("MotiesTweedeKamer")
---
> let $col := pf:collection("MotiesTweedeKamer")
$ mclient -t -lx -g q9-fn.xq | grep -v '^<aantal .*/>'
Trans 32.156 msec
Shred 0.000 msec
Query 3272.690 msec
Print 1.598 msec
Timer 3339.119 msec
$ mclient -t -lx -g q9-pf.xq | grep -v '^<aantal .*/>'
Trans 27.816 msec
Shred 0.000 msec
Query 3247.053 msec
Print 1.594 msec
Timer 3308.713 msec
$ mclient -t -lx q9-pf.xq | grep -v '^<aantal .*/>'
[makes Mserver grow >> 13 GB and (hence) runs very long ...]
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-08 16:26
Message:
$ mclient -lx -s'count(pf:collection("MotiesTweedeKamer"))'
1
and
$ mclient -lx -g -s'count(pf:collection("MotiesTweedeKamer"))'
1
are indeed correct. no point.
But
$ mclient -lx -s'count(pf:collection("MotiesTweedeKamer")/*)'
0
and
$ mclient -lx -s'count(pf:collection("MotiesTweedeKamer")/*)'
0
are (at least) "unexpected".
And since
$ mclient -lx -s'count(pf:collection("MotiesTweedeKamer")/node())'
27946
and
$ mclient -lx -s'count(pf:collection("MotiesTweedeKamer")/node())'
27946
(mind the "/node()" instead of "/*"!)
appear to work correctly, too,
I tried to find out why the "/*" yields "unexpected" results;
hence, my "layman's" attempt to check the type returned by
"pf:collection(<colname>)/*":
$ mclient -lx -s'doc(pf:collection("MotiesTweedeKamer")/*)'
MAPI = [EMAIL PROTECTED]:50000
QUERY = doc(pf:collection("MotiesTweedeKamer")/*)
ERROR = !type error: no variant of function fn:doc accepts the given
argument type(s): string?
!type error: maybe you meant:
!type error: fn:doc (string?) as document { node }?
!type error: illegal arguments for function fn:doc
and
$ mclient -lx -g -s'doc(pf:collection("MotiesTweedeKamer")/*)'
MAPI = [EMAIL PROTECTED]:50000
QUERY = doc(pf:collection("MotiesTweedeKamer")/*)
ERROR = !type error: no variant of function fn:doc accepts the given
argument type(s): string?
!type error: maybe you meant:
!type error: fn:doc (string?) as document { node }?
!type error: illegal arguments for function fn:doc
...
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-10-08 16:11
Message:
The pf:collection returns correctly 1 node. The (string?) error message is
misleading here, fn:doc wants a string?, not that pf:collection is giving
one. I think what it does not like is the cardinality, since you are giving
a *.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-08 15:54
Message:
It seems as if the return type of pf:collection() is not quite correct:
string? instead of node :
========
15:50:43 [EMAIL PROTECTED]:/tmp $ mclient -lx -s'pf:collections()'
<collection updatable="false" size="120 MiB"
numDocs="27946">MotiesTweedeKamer</collection>,
<collection updatable="true" size="237 MiB"
numDocs="27946">MotiesTweedeKamer_Updatable</collection>
15:50:46 [EMAIL PROTECTED]:/tmp $ mclient -lx -g -s'pf:collections()'
<collection updatable="false" size="120 MiB"
numDocs="27946">MotiesTweedeKamer</collection>,
<collection updatable="true" size="237 MiB"
numDocs="27946">MotiesTweedeKamer_Updatable</collection>
15:50:47 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer"))'
1
15:51:01 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer"))'
1
15:51:06 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer")//document)'
27946
15:51:20 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer")//document)'
27946
15:51:26 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer")/*)'
0
15:51:34 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer")/*)'
0
15:51:39 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer")/*/*)'
0
15:51:44 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer")/*/*)'
0
15:51:47 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'doc(pf:collection("MotiesTweedeKamer")/*)'
MAPI = [EMAIL PROTECTED]:50000
QUERY = doc(pf:collection("MotiesTweedeKamer")/*)
ERROR = !type error: no variant of function fn:doc accepts the given
argument type(s): string?
!type error: maybe you meant:
!type error: fn:doc (string?) as document { node }?
!type error: illegal arguments for function fn:doc
15:51:57 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'doc(pf:collection("MotiesTweedeKamer")/*)'
MAPI = [EMAIL PROTECTED]:50000
QUERY = doc(pf:collection("MotiesTweedeKamer")/*)
ERROR = !type error: no variant of function fn:doc accepts the given
argument type(s): string?
!type error: maybe you meant:
!type error: fn:doc (string?) as document { node }?
!type error: illegal arguments for function fn:doc
15:52:04 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer")/node())'
27946
15:52:13 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer")/node())'
27946
15:52:18 [EMAIL PROTECTED]:/tmp $ mclient -lx
-s'count(pf:collection("MotiesTweedeKamer")/node()/document)'
27946
15:52:25 [EMAIL PROTECTED]:/tmp $ mclient -lx -g
-s'count(pf:collection("MotiesTweedeKamer")/node()/document)'
27946
========
!??
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-08 15:31
Message:
Loredana,
(1)
The performance differences you experience are most probably not caused by
fn:collection() vs. pf:collection() but by ALG vs. MPS (see also (2)) ---
just try MPS with pf:collection() and it should show the same performance
as MPS with fn:collection().
(2)
Since MPS and ALG use quite different ways & techniques to translate and
optimize queries, it is expected that they show performance differences
(sometimes severe ones) on the same query; we (plus the Pathfinder folks in
Tübingen) have to check, what goes wrong with ALG in case of your query
...
Thanks for reporting it.
(3)
We also have to check the unexpected(?) behaviour of pf:collection().
Thanks for reporting it.
----------------------------------------------------------------------
Comment By: Loredana Afanasiev (lafanasi)
Date: 2008-10-08 15:13
Message:
Subject: fn:collection() vs pf:collection()
Hi Lefteris and all,
I think this relevant to this bug report.
I get huge performance times difference when switching from
fn:collection() to pf:collection(). I can you please advise me how to avoid
the long times while running on algebra version?
thanks in advance,
l.
[EMAIL PROTECTED] xq]$ more q9-fn.xq
let $col := fn:collection("MotiesTweedeKamer")
for $y in distinct-values(for $y in $col//hiddendatum
return substring-before(fn:str
ing($y),'.'))
let $thisyear := $col//document
[substring-before(fn:string(.//hiddendatum[1]),'.')=$y]
let $partij := distinct-values($thisyear//partij)
for $p in $partij
let $aantalingediendemoties :=
count($thisyear[.//indienergnlod//partij=$p])
let $aantalmedeingediendemoties :=
count($thisyear[.//medeindienergnlod//partij=$p])
order by $y descending,
$aantalingediendemoties descending,
$aantalmedeingediendemoties descending
return
<aantal jaar='{$y}'
partij='{$p}'
aantalingediendemoties='{$aantalingediendemoties}'
aantalmedeingediendemoties='{$aantalmedeingediendemoties}'
/>
[EMAIL PROTECTED] xq]$ mclient -lx -g -t q9-fn.xq
Trans 35.174 msec
Shred 0.000 msec
Query 4401.659 msec
Print 2.680 msec
Timer 4487.610 msec
[EMAIL PROTECTED] xq]$ more q9-pf.xq
let $col := pf:collection("MotiesTweedeKamer")
for $y in distinct-values(for $y in $col//hiddendatum
return substring-before(fn:str
ing($y),'.'))
let $thisyear := $col//document
[substring-before(fn:string(.//hiddendatum[1]),'.')=$y]
let $partij := distinct-values($thisyear//partij)
for $p in $partij
let $aantalingediendemoties :=
count($thisyear[.//indienergnlod//partij=$p])
let $aantalmedeingediendemoties :=
count($thisyear[.//medeindienergnlod//partij=$p])
order by $y descending,
$aantalingediendemoties descending,
$aantalmedeingediendemoties descending
return
<aantal jaar='{$y}'
partij='{$p}'
aantalingediendemoties='{$aantalingediendemoties}'
aantalmedeingediendemoties='{$aantalmedeingediendemoties}'
/>
[EMAIL PROTECTED] xq]$ mclient -lx -t q9-pf.xq
Timer 584185.070 msec
[EMAIL PROTECTED] xq]$ diff q9-fn.xq q9-pf.xq
1c1
< let $col := fn:collection("MotiesTweedeKamer")
---
> let $col := pf:collection("MotiesTweedeKamer")
Besides this there is something strange happening with pf:collection().
The website says:
"
pf:collection() returns a single special collection node, whose immediate
children are the document nodes. Therefore, fn:collection("my-collection")
is roughly equivalent to pf:collection("my-collection")/*.
"
While I get:
[EMAIL PROTECTED] xq]$ mclient -lx -g -s
'count(fn:collection("MotiesTweedeKamer"))'
27946
[EMAIL PROTECTED] xq]$ mclient -lx -g -s
'count(pf:collection("MotiesTweedeKamer"))'
1
[EMAIL PROTECTED] xq]$ mclient -lx -g -s
'count(pf:collection("MotiesTweedeKamer")/*)'
0
[EMAIL PROTECTED] xq]$ mclient -lx -s
'count(pf:collection("MotiesTweedeKamer")/*)'
0
[EMAIL PROTECTED] xq]$ mclient -lx -s
'count(pf:collection("MotiesTweedeKamer")/*/*)'
0
[EMAIL PROTECTED] xq]$ mclient -lx -s
'count(pf:collection("MotiesTweedeKamer")/*/document)'
0
[EMAIL PROTECTED] xq]$ mclient -lx -s
'count(pf:collection("MotiesTweedeKamer")//document)'
27946
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2144639&group_id=56967
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs