Bugs item #2728133, was opened at 2009-04-03 11:52
Message generated for change (Settings changed) made by sf-robot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2728133&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: MonetDB4 "stable"
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Wouter Alink (vzzzbx)
Assigned to: Nobody/Anonymous (nobody)
Summary: M4: count() returns int

Initial Comment:
To scale PF/TIJAH (and also the rest of pathfinder/M4) beyond the magical 31 
bit boundary, the count() function needs to return a wrd instead of an int. 
PF/TIJAH hits the 2G border earlier due to the fact that it counts each word in 
an XML element, where pathfinder only counts a text element as a single node. 
In M5 this apparently is designed right from the start. In M4 it seems to be a 
legacy issue. 

A simple analysis showed that in the pathfinder code there are +- 584 
occurrences of the 'count()' function (c and MIL invocations). In PF/TIJAH 
there are +- 155. Most of them seem to be easily to replace. Stefan estimated 
the change to be a 'weekend's work. A simple replacement function he came up 
with.

    PROC count_wrd(BAT b) : wrd {
        return wrd(count(int(b)));
    }

It is debatable whether to actually implement this fix, as the M4 code is 
basically end-of-life, and M5 does not exhibit this bug. A plan needs to be 
made. To be continued.

----------------------------------------------------------------------

>Comment By: SourceForge Robot (sf-robot)
Date: 2010-04-25 02:20

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 365 days (the time period specified by
the administrator of this Tracker).

----------------------------------------------------------------------

Comment By: Wouter Alink (vzzzbx)
Date: 2009-04-24 14:17

Message:
Short update: 

This bug was mainly filed for scalability issues in pf-tijah. Today I went
through the pftijah code and changed the code so that it uses count_wrd
instead of count. Still awaiting (local) test-results. More news (and
perhaps a commit) after the weekend. 

----------------------------------------------------------------------

Comment By: Stefan Manegold (stmane)
Date: 2009-04-03 12:28

Message:
- GDK C function BATcount() does return a BUN, not an int.

- in MIL, we have
COMMAND:   count(BAT[any,any]) : int
MODULE:    algebra
COMPILED:  by adm on Wed Apr  1 23:29:46 2009
Returns the number of elements currently in a BAT.

COMMAND:   count(int) : lng
MODULE:    bat
COMPILED:  by adm on Wed Apr  1 23:29:46 2009
Returns the current size (in number of elements) of a BAT.

- my correct quote was "If you have a weekend, you're welcome to do it
[i.e., change the count(BAT[any,any]) to return a wrd instead of and int
--- and fix all MIL code that uses count(BAT[any,any]), accordingly --- and
for consistency change all MIL function that return or expect a count or
index of BUN(s) (thing of grouped counts for aggragation, slice, etc.)
likewise]" --- don't know, though, whether a weekend is sufficient to do it
all correctly & consistently ...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2728133&group_id=56967

------------------------------------------------------------------------------
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to