Dear Christian,

Thanks for your answer.
>It may be interesting to do some testing in order to find out what’s
>the actual bottleneck in your query. How man entries is your hash set
>supposed to contain?

The HashSet contains 10M strings. That's not a big deal. The problems comes
from filling the HashSet.
Well, let us be more technacal. I am using hashset for tree traversal
operations over a DataBase of 10M nodes, 1Go. This HashSet collects some
informations during the traversal, mainly a string at each node. Actually,
my code binds JAVA with something like

import module namespace set = "java.util.HashSet";

declare function init(){set:clear(),
tree_traversal($my_big_DataBase), do_some_stuff_with_hash_set()}
declare function tree_traversal($elements)
{
    for $element in $elements return (
           set:add(get_some_stuff_within($element) ),
           tree_traversal($element/element())
     )
}
and the whole traversal takes a minute.

I don't know how to achieve this with pure XQUERY. The only thing that I
tried is something like  :

declare function init(){let $my_hash_set :=
tree_traversal($my_big_DataBase, map:new()) return
do_some_stuff_with($my_hash_set) }

declare function tree_traversal($elements, $hash_set) as function(*)
(return a HashSet)
{
     for $element in $elements return (
           tree_traversal($nodes/element(), $hash_set),
           map:entry( get_some_stuff_within($element),boolean)
     )
}
that performed very badly, compared to the first version (JAVA binding). I
explained this thinking that the first version is O(N) operations, and the
second is O(N^2) operations due to in-memory copy.

>> - second : it lacks powerful libraries, as complete math modules.

> What kind of functions would you be interested in?

I need a stochastic modulus, as
http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html
also a linear Algebra modulus as UBLAS
http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htm

I guess that I could find JAVA equivalent and bind them to BaseX. Such
functionalities are available directly as XQUERY modules ?

Cheers,

Jean-Marc




2013/11/12 Christian Grün <christian.gr...@gmail.com>

> Dear JohnLeM,
>
> thanks for your mail. As you already noted, XQuery is a functional
> language, and this is the reason why XQuery maps are not exactly
> comparable to maps and sets, as they are used in imperative languages:
>
> All maps in XQuery are persisent (immutable): Once a map has been
> generated, it is not possible to change its contents. Instead, a new
> map is to be created by each insertion or deletion [1]. This sounds
> like a huge memory killer, but it’s not as bad as you might guess.
> Various efficient solutions exist for persistent map, such as the
> mapped trie that has been implemented in BaseX [2]. It will only
> create copies of parts of the data structure that are to be changed.
> The following query is an example for a query which creates 100.000
> map with a single entry, and a large map containing all the entries;
> on my system, it requires 200 ms to run:
>
>   map:size(map:new(
>     for $i in 1 to 100000
>    return map { $i := true() }
>   ))
>
> In short: persistent maps may not be as efficient as mutable maps, but
> they are usually not the bottleneck in XQuery applications, because
> deleted entries (or obsolete maps) will automatically be discarded by
> the garbage collector as soon as they are not in the query scope
> anymore. If you want to enforce this, you can put your map operations
> into FLWOR expresions or user-declared functions.
>
> Back to your original questions:
>
> > 2) This module must expose a method "hashset:clear($hashset)" to
> de-allocate
> > memory dynamically. The map:module provides the function map:remove, and
> I
> > could remove all elements. [...]  It does not deallocate memory, leading
> to poor
> > overall performances.
>
> It may be interesting to do some testing in order to find out what’s
> the actual bottleneck in your query. How man entries is your hash set
> supposed to contain?
>
> > 3) must expose a method "hashset:add($hashset,$element)" to add memory
> > dynamically. Through map:module, the only possibility that I see is to
> wrap
> > it as map:new($hashset, map:entry($element,$dummyboolean)).
>
> Right, using true() is probably the best choice (booleans are only
> instantiated once in memory).
>
> > - first : its dynamic memory management (the in-memory printfoot of my
> > XQUERY executables are usually tremendous).
>
> This can in fact be a problem, and is mostly due to various decisions
> taken in the specification, and the complexity of XML nodes in
> general.
>
> > - second : it lacks powerful libraries, as complete math modules.
>
> What kind of functions would you be interested in?
>
> Christian
>
> [1] http://en.wikipedia.org/wiki/Persistent_data_structure
> [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie
>
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to