Using a worker as an alternative to recursion

David Adams via 4D_Tech Mon, 20 Mar 2017 18:33:04 -0700

Okay, this next post is almost certainly of narrow use and interest but
someone may find it useful.


I've been thinking a lot lately as CALL WORKER as a control structure. It's
not exactly a control structure, but it's easy to see it as more like a
control structure than as a message. 4D hasn't introduced new control
structures since ever, as far as I know. We've got the standard collection
(with variations on implementation) that come down from  Dijkstra's
Structured Programming and Pascal. Plus a couple of sort-of interrupts (ON
ERR CALL, etc.) We don't for example, have Try/Catch or more OO-oriented
flow-of-control mechanisms. We do have the ability to inject/expand code
in-line through all forms of EXECUTE. But CALL WORKER is a bit different.
It's not injecting code at the point of execution, like EXECUTE METHOD or
EXECUTE FORMULA. Instead, it appends code to the end of the current thread
of execution where it will be reached and processed _in order_. I don't
know if there is a proper name for this style of self-modifying code, but
I'm going to call it "chaining." Your worker is running some code:

// MyWorker
1: Doing my thang!
2: CALL WORKER("MyWorker";"DoWork";"DoTheOtherThang")
3: Open document
4: Process document
5: Close document
....all done with the code that you see in the method editor
6: DoTheOtherThing // The code inserted up at 2.

So, the code order is
1
2
3
4
5
6

The execution order is (effectively)
1
3
4
5
2

This can lead to some pretty twisted results, but it can also be exploited.
Where? As an alternative to recursion. Recursion is an expensive way to
manage a processing stack, but an easy way to implement a
root-to-leaf-left-to-right tree scan. Say you're working your way through a
series of nested folders, a large XML tree, or some other tree structure.
It is so commonplace - even universal - to do a node-to-leaves walk that
the approach is synonymous with "tree walk." But there are several other
ways to walk trees. You might want to start at the leaves, you might want
to scan from right-to-left, or you might want to do a "level order" scan.
This last one is cheap and easy to implement using CALL WORKER. For those
that don't remember, a "level order" scan starts at the root (your top
level folder for example), then goes to the next level and scans it
completely (folders in the top folder), then down to the next level, and so
on. So, you always check every child before checking their
children/grandchildren, etc. A recursive search scans through a tree all
the way to the bottom of a search path before heading to the top. Level
order searches are handy when you need to make sure that parents all exist
first. This is sometimes useful, for example, when importing data where
someone has structured it so that you've got two parent tables and a
joining table in the same JSON or XML file. To create the child records,
you may need both parents in place already due to cardinality constraints.
Hard to do with a root-to-leaves scan, easy to do with a level-order scan.

Recursion has the obvious disadvantage of 1) being a bit bewildering at
first and 2) being an expensive way to handle a stack of pending processing
values. Instead of just having a stack of things to do, you create full
copies of the working method with different parameters. With CALL WORKER,
you can _chain_ the requests instead. I thought of this last night, figured
that it would generate a level-order scan and just tried it out. Yes,
that's what it does. With a recursive scan, your peak memory consumption is
going to be something like:

Method size + (Method size * Maximum tree depth)
With a chained approach, your maximum memory consumption is going to be
more like:

Method size + Method size
So, you save around

Method size * (Maximum tree depth - 1)
Does the memory difference matter? Depends. Probably not in most cases - I
don't run into problems very often, but it's possible with a huge tree. The
first, best thing to do there is to compile. That's still the best and
easiest 4D optimization under the sun. But, hey, this is a bit of a
thinking-things-through exercise. Like I said, of narrow interest and
application.

As a sketch, some fragments of the recursive and chained test code I'm
using:

// FolderScan_Recursive
C_TEXT($1;$folder_path)  //The path to the folder we want to delete (should
end with a directory delimiter)

If (Process_GetName ="FolderScan")
$folder_path:=$1
OK:=1
If (Test path name($folder_path)=Is a folder)
ARRAY TEXT($file_names;0)
DOCUMENT LIST($folder_path;$file_names)
C_TEXT($file_path)
C_LONGINT($file_index)
For ($file_index;1;Size of array($file_names))
$file_path:=$folder_path+$file_names{$file_index}
FolderScan_OnNodeRead ($file_path)
End for
FOLDER LIST($folder_path;$folder_names)
C_LONGINT($folder_index)
For ($folder_index;1;Size of array($folder_names))
  // Stack call and run recursively
FolderScan_Recursive ($folder_path+$folder_names{$folder_index}+Folder
separator)  //Recursively nest here
End for
End if
End if

// FolderScan_Chained
C_TEXT($1;$folder_path)  //The path to the folder we want to delete (should
end with a directory delimiter)

If (Process_GetName ="FolderScan")
$folder_path:=$1
OK:=1
If (Test path name($folder_path)=Is a folder)
ARRAY TEXT($file_names;0)
DOCUMENT LIST($folder_path;$file_names)
C_TEXT($file_path)
C_LONGINT($file_index)
For ($file_index;1;Size of array($file_names))
$file_path:=$folder_path+$file_names{$file_index}
FolderScan_OnNodeRead ($file_path)
End for
FOLDER LIST($folder_path;$folder_names)
C_LONGINT($folder_index)
For ($folder_index;1;Size of array($folder_names))
  // Append call and run when this method is exhausted
CALL
WORKER("FolderScan";"FolderScan_Chained";$folder_path+$folder_names{$folder_index}+Folder
separator)
End for
End if
End if

P.S. For those following closely. Yes, you're correct, the chaining method
only works so long as there's only one request being handled
simultaneously. I haven't tested it out, but I think that this is going to
work automatically so long as you have a single entry point, like this:

// FolderScan_WorkerSetup

If (Process_GetName ="FolderScan")
C_TEXT($1;$folder_path)
C_LONGINT($2;FolderScan_Callback_winref)
C_TEXT($3;FolderScan_Callback_method)
C_TEXT($4;$scan_type_name)
 // This would be a good control point to
$folder_path:=$1
FolderScan_Callback_winref:=$2
FolderScan_Callback_method:=$3
$scan_type_name:=$4
ARRAY TEXT(FolderScanTry_NodeText_at;0)
OK:=1
ON ERR CALL("ErrorHandler_ForFileOperations")
Case of
: ($scan_type_name="Recursive")
FolderScan_Recursive ($folder_path)
: ($scan_type_name="Chained")
CALL WORKER("FolderScan";"FolderScan_Chained";$folder_path)
Else
TRACE  // Error
End case
End if

I'll double-check that this works if I ever put this sort of code into
production.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Using a worker as an alternative to recursion

Reply via email to