Bugs item #2728133, was opened at 2009-04-03 11:52
Message generated for change (Settings changed) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2728133&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: MonetDB4 "stable"
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Wouter Alink (vzzzbx)
Assigned to: Nobody/Anonymous (nobody)
Summary: M4: count() returns int
Initial Comment:
To scale PF/TIJAH (and also the rest of pathfinder/M4) beyond the magical 31
bit boundary, the count() function needs to return a wrd instead of an int.
PF/TIJAH hits the 2G border earlier due to the fact that it counts each word in
an XML element, where pathfinder only counts a text element as a single node.
In M5 this apparently is designed right from the start. In M4 it seems to be a
legacy issue.
A simple analysis showed that in the pathfinder code there are +- 584
occurrences of the 'count()' function (c and MIL invocations). In PF/TIJAH
there are +- 155. Most of them seem to be easily to replace. Stefan estimated
the change to be a 'weekend's work. A simple replacement function he came up
with.
PROC count_wrd(BAT b) : wrd {
return wrd(count(int(b)));
}
It is debatable whether to actually implement this fix, as the M4 code is
basically end-of-life, and M5 does not exhibit this bug. A plan needs to be
made. To be continued.
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2010-04-25 02:20
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 365 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Wouter Alink (vzzzbx)
Date: 2009-04-24 14:17
Message:
Short update:
This bug was mainly filed for scalability issues in pf-tijah. Today I went
through the pftijah code and changed the code so that it uses count_wrd
instead of count. Still awaiting (local) test-results. More news (and
perhaps a commit) after the weekend.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-03 12:28
Message:
- GDK C function BATcount() does return a BUN, not an int.
- in MIL, we have
COMMAND: count(BAT[any,any]) : int
MODULE: algebra
COMPILED: by adm on Wed Apr 1 23:29:46 2009
Returns the number of elements currently in a BAT.
COMMAND: count(int) : lng
MODULE: bat
COMPILED: by adm on Wed Apr 1 23:29:46 2009
Returns the current size (in number of elements) of a BAT.
- my correct quote was "If you have a weekend, you're welcome to do it
[i.e., change the count(BAT[any,any]) to return a wrd instead of and int
--- and fix all MIL code that uses count(BAT[any,any]), accordingly --- and
for consistency change all MIL function that return or expect a count or
index of BUN(s) (thing of grouped counts for aggragation, slice, etc.)
likewise]" --- don't know, though, whether a weekend is sufficient to do it
all correctly & consistently ...
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2728133&group_id=56967
------------------------------------------------------------------------------
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs