Bugs item #2806488, was opened at 2009-06-15 11:37
Message generated for change (Comment added) made by boncz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2806488&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: XML
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John van Schie (johnvanschie)
Assigned to: Nobody/Anonymous (nobody)
Summary: Some XML attributes not selectable.
Initial Comment:
MonetDB/XQuery Feb2009-SP2, Fedora Core 10 x86_64, 64 bit OID's, RPM install.
We have a MonetDB server that contains a fairly large (678 docs) collection of
small documents. When selecting attributes from elements within the documents,
some strange behaviour is noticed. It can be explained best with an query.
----
let $tasks := pf:collection('c2_log.xml')//success/ta...@duration]
for $task in $tasks
(: tasks should always have a xid and duration attribute when successful :)
where count($task/@*) != 2
return element {$task/name()} {$task/@*, $task}
----
<task duration="49506"><task duration="49506" xid="3"/></task>,
<task duration="24503"><task duration="24503" xid="799"/></task>,
<task duration="1531"><task duration="1531" xid="822"/></task>,
<task duration="5724"><task duration="5724" xid="7971"/></task>,
<task duration="439"><task duration="439" xid="8591"/></task>,
<task duration="510"><task duration="510" xid="8592"/></task>,
<task duration="1351"><task duration="1351" xid="8600"/></task>,
<task duration="749"><task duration="749" xid="8602"/></task>,
<task duration="2100"><task duration="2100" xid="22742"/></task>,
----
Thus we select all tasks that do not have two attributes and we see tasks
elements that have their xid attribute missing. But if we print the whole task
element, the missing xid attribute magically re-appears.
When we run the same query on a single document of the collection, we see the
same strange behaviour. We are not able to reproduce this behaviour by
exporting a problematic document and shredding it.
----------------------------------------------------------------------
>Comment By: Peter Boncz (boncz)
Date: 2009-07-09 20:15
Message:
This is most likely a collection with updatable documents. Most likely some
xid attributes, were deleted? Is that correct? Or where they inserted? I
also think that you man that the correct behavior would be that the xid
attributes would not appear when printed.
The problem is probably in the interaction between the shared master
ATTR_OWN and the deltas in the query working set. Most likely in the
serialization code. Maybe it does not properly handle deleted attributes.
I fear we can conclude from your recent batch of bugs that there are still
a number of bugs in the update code. We will try to locate them, but this
is often complicated. Also by the fact that this month I am on holiday in
Argentina...
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2009-07-09 17:44
Message:
Hi John,
I guess something is wrong in the document representation (and I'm not the
M/XQ runtime guy).
I however attached a MIL script (debug.mil) that runs the simplified
query
``count(pf:collection('c2_log.xml')//success/task/@*)'' and gives some
additional information about the inputs to the attribute step. I hope the
result will give a better indication who needs to fix the problem :)
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-07-09 13:28
Message:
Jan,
Is there any way I can provide you with more information to find a
solution? Or should this bug be re-assigned to somebody else (as it could
be non-pathfinder related)
Thanks,
John
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-23 14:43
Message:
Hi Jan,
I can provide all required data, if you give me pointers how to obtain it.
The data is not confidential, and thus I could provide an non-anonymized
data set.
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2009-06-18 18:35
Message:
Hi John,
already the simplified query
``count(pf:collection('c2_log.xml')//success/task/@*)'' provides an
incorrect result as it should return (1104491 xid + 135 Duration =) 1104626
attributes.
Without more information I however cannot help further.
(With the anonymized working set information in ATTR_OWN,
ATTR_OWN_PRIVATE, ATTR_OWN_SHARED, and the NID values of the tasks somebody
at CWI could perhaps get an idea.)
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-17 13:20
Message:
Jan,
I've tried to execute the query listed below with the MIL code generated
by the Feb2009-SP2 release and the May2009 release. Although the MIL code
generated differs, the resulting XML output does not. Both return 135 task
items, where I expect 0.
The query:
let $tasks := pf:collection('c2_log.xml')//success/ta...@duration]
for $task in $tasks
where count($task/@xid) != 1
return element {$task/name()} {$task/@*, $task}
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-17 10:31
Message:
> How many attributes does the task element have?
> count(pf:collection('c2_log.xml')//success/task/@*)
>
> How many xid attributes does the task elements have?
> count(pf:collection('c2_log.xml')//success/task/@xid)
xquery>count(pf:collection('c2_log.xml')//success/task/@*)
more>1104491
xquery>count(pf:collection('c2_log.xml')//success/task/@xid)
more>1104356
> If understand your setup correctly the first query should return
1104626
> and the second 1104491...(Otherwise I assume we have a simplified
version
> of the problem.)
I don't really get how you've deduced those numbers. The problem, as I see
it, is that for all successful task elements that have a duration
attribute, the xid attribute is not available in the index. So the query on
@xid returns empty for those elements, but printing the elements does yield
their xid attribute. An example can be seen in the first post of the bug
report.
Thus we have 1104491 successful tasks that have any attribute, and all of
them should have an xid attribute. There are 135 elements that have an xid
and a duration attribute, and for those, the xid attribute seems
ignored/unqueryable by MonetDB. So a count on all successful tasks with an
xid attribute yields 1104491 - 135 = 1104356.
I hope this explanation describes the problem better.
Jan, I'll try to generate the MIL for the query listed in your first post
on the May2009 next and post the result here.
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-17 10:21
Message:
I see that I've made a typo in the results of the queries. Query 7 returns
135.
> To which extend do the results match with your expectation, e.g., do
you
> expect only 135 of the 1104491 successful tasks to have a duration
> attribute?
I expect a total of 1104491 successful tasks, 1104356 with only an xid
attribute and 135 with an xid attribute and a duration attribute. So I
expected query 4 to return 0, query 5 to return 135 and query 7 to return
0.
> Am I right, that the 9 tasks listed as result of your original query
are
> just a sample of the complete result? (from you below results, I'd
expect 135 tasks, right?)
You're right. It was just to make a small report, explaining the problem.
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2009-06-16 17:37
Message:
How many attributes does the task element have?
count(pf:collection('c2_log.xml')//success/task/@*)
How many xid attributes does the task elements have?
count(pf:collection('c2_log.xml')//success/task/@xid)
If understand your setup correctly the first query should return 1104626
and the second 1104491...(Otherwise I assume we have a simplified version
of the problem.)
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-06-16 16:35
Message:
To which extend do the results match with your expectation, e.g., do you
expect only 135 of the 1104491 successful tasks to have a duration
attribute?
Am I right, that the 9 tasks listed as result of your original query are
just a sample of the complete result?
(from you below results, I'd expect 135 tasks, right?)
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-16 16:25
Message:
Stefan,
The requested queries:
1)
Query:
let $tasks := pf:collection('c2_log.xml')//success/ta...@xid and
@duration]
for $task in $tasks
where count($task/@*) != 2
return element {$task/name()} {$task/@*, $task}
Result:
No output
2)
Query:
count(pf:collection('c2_log.xml')//success/task)
Result:
1104491
3)
Query:
count(pf:collection('c2_log.xml')//success/ta...@* and @duration])
Result::
135
4)
Query:
count(pf:collection('c2_log.xml')//success/task[count(@*) < 2 and
@duration])
Result:
135
5)
Query:
count(pf:collection('c2_log.xml')//success/task[count(@*) = 2 and
@duration])
Result
0
6)
Query:
count(pf:collection('c2_log.xml')//success/task[count(@*) > 2 and
@duration])
Result:
0
7)
Query:
count(pf:collection('c2_log.xml')//success/task[count(@*) != 2 and
@duration])
Result:
0
8) (bonus)
Query:
count(pf:collection('c2_log.xml')//success/ta...@* and not(@duration)])
Result:
1104356
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-06-16 16:06
Message:
John,
sounds fine with me.
Stefan
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-16 15:42
Message:
Stefan,
I'm sorry, but when I inspect our code, I see that it is possible for
successful tasks to have only an xid attribute. Valid combinations are
either an xid attribute or an xid and duration attribute. So succesful
tasks without xid attribute are not valid.
The changed assumption, changes the expected output for your requested
queries. Am I correct to change them by adding a @duration filter to your
first, third, fourth, fifth, sixth and seventh requested query?
Jan,
I'll generate the MIL and execute it as soon as I have a May2009 install
ready.
Regards,
John
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-06-16 14:36
Message:
John,
could you please run the following queries on the same database and report
their results:
let $tasks := pf:collection('c2_log.xml')//success/ta...@xid]
for $task in $tasks
where count($task/@*) != 2
return element {$task/name()} {$task/@*, $task}
count(pf:collection('c2_log.xml')//success/task)
count(pf:collection('c2_log.xml')//success/ta...@*])
count(pf:collection('c2_log.xml')//success/task[count(@*) < 2])
count(pf:collection('c2_log.xml')//success/task[count(@*) = 2])
count(pf:collection('c2_log.xml')//success/task[count(@*) > 2])
count(pf:collection('c2_log.xml')//success/task[count(@*) != 2])
----------------------------------------------------------------------
Comment By: Jan Rittinger (tsheyar)
Date: 2009-06-16 13:56
Message:
Hi John,
I checked the compile time part of the query which looks fine. So it most
certainly is a runtime problem.
Out of curiosity, what happens if you modify your query to check for the
xid attributes directly?
let $tasks := pf:collection('c2_log.xml')//success/ta...@duration]
for $task in $tasks
where count($task/@xid) != 1
return element {$task/name()} {$task/@*, $task}
And what happens if you choose a newer compiler version to run the query?
To do that just compile your query with pf and feed the resulting MIL code
to the old runtime.
----------------------------------------------------------------------
Comment By: John van Schie (johnvanschie)
Date: 2009-06-16 11:58
Message:
Stefan,
The 678 documents are shredded incrementally into the same collection.
After each tool run, a XML log file is created and shredded in MonetDB,
into the log collection.
The only other operation performed on the log collection is copy; A
selection on the collection ( pf:collection("c2_log.xml")//run ) is
executed and the XML that is returned is stored to file. This file is than
read-only shredded in the database, but *not* into the log collection.
No XQuery updates are performed on the log document, although the
documents are flagged 'updateable'.
If you need more information (the server is still running), please let me
know.
John
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-06-16 10:34
Message:
John,
how has your collection initially been created?
a) shredding all 678 docs in one go?
b) shredding the 678 docs incrementally into the same collection (e.g., in
small batches or one at a time)?
Have there been updates on the collection during (incremental) and/or
after shredding (all documents)?
If so, have those updates touched/altered the data that now appears to be
inconsistent, i.e., the task elements with two attributes of with only one
appears to be directly addressable?
Thanks!
Stefan
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2806488&group_id=56967
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs