Dear Christian, Thanks for your answer. >It may be interesting to do some testing in order to find out what’s >the actual bottleneck in your query. How man entries is your hash set >supposed to contain?
The HashSet contains 10M strings. That's not a big deal. The problems comes from filling the HashSet. Well, let us be more technacal. I am using hashset for tree traversal operations over a DataBase of 10M nodes, 1Go. This HashSet collects some informations during the traversal, mainly a string at each node. Actually, my code binds JAVA with something like import module namespace set = "java.util.HashSet"; declare function init(){set:clear(), tree_traversal($my_big_DataBase), do_some_stuff_with_hash_set()} declare function tree_traversal($elements) { for $element in $elements return ( set:add(get_some_stuff_within($element) ), tree_traversal($element/element()) ) } and the whole traversal takes a minute. I don't know how to achieve this with pure XQUERY. The only thing that I tried is something like : declare function init(){let $my_hash_set := tree_traversal($my_big_DataBase, map:new()) return do_some_stuff_with($my_hash_set) } declare function tree_traversal($elements, $hash_set) as function(*) (return a HashSet) { for $element in $elements return ( tree_traversal($nodes/element(), $hash_set), map:entry( get_some_stuff_within($element),boolean) ) } that performed very badly, compared to the first version (JAVA binding). I explained this thinking that the first version is O(N) operations, and the second is O(N^2) operations due to in-memory copy. >> - second : it lacks powerful libraries, as complete math modules. > What kind of functions would you be interested in? I need a stochastic modulus, as http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html also a linear Algebra modulus as UBLAS http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htm I guess that I could find JAVA equivalent and bind them to BaseX. Such functionalities are available directly as XQUERY modules ? Cheers, Jean-Marc 2013/11/12 Christian Grün <christian.gr...@gmail.com> > Dear JohnLeM, > > thanks for your mail. As you already noted, XQuery is a functional > language, and this is the reason why XQuery maps are not exactly > comparable to maps and sets, as they are used in imperative languages: > > All maps in XQuery are persisent (immutable): Once a map has been > generated, it is not possible to change its contents. Instead, a new > map is to be created by each insertion or deletion [1]. This sounds > like a huge memory killer, but it’s not as bad as you might guess. > Various efficient solutions exist for persistent map, such as the > mapped trie that has been implemented in BaseX [2]. It will only > create copies of parts of the data structure that are to be changed. > The following query is an example for a query which creates 100.000 > map with a single entry, and a large map containing all the entries; > on my system, it requires 200 ms to run: > > map:size(map:new( > for $i in 1 to 100000 > return map { $i := true() } > )) > > In short: persistent maps may not be as efficient as mutable maps, but > they are usually not the bottleneck in XQuery applications, because > deleted entries (or obsolete maps) will automatically be discarded by > the garbage collector as soon as they are not in the query scope > anymore. If you want to enforce this, you can put your map operations > into FLWOR expresions or user-declared functions. > > Back to your original questions: > > > 2) This module must expose a method "hashset:clear($hashset)" to > de-allocate > > memory dynamically. The map:module provides the function map:remove, and > I > > could remove all elements. [...] It does not deallocate memory, leading > to poor > > overall performances. > > It may be interesting to do some testing in order to find out what’s > the actual bottleneck in your query. How man entries is your hash set > supposed to contain? > > > 3) must expose a method "hashset:add($hashset,$element)" to add memory > > dynamically. Through map:module, the only possibility that I see is to > wrap > > it as map:new($hashset, map:entry($element,$dummyboolean)). > > Right, using true() is probably the best choice (booleans are only > instantiated once in memory). > > > - first : its dynamic memory management (the in-memory printfoot of my > > XQUERY executables are usually tremendous). > > This can in fact be a problem, and is mostly due to various decisions > taken in the specification, and the complexity of XML nodes in > general. > > > - second : it lacks powerful libraries, as complete math modules. > > What kind of functions would you be interested in? > > Christian > > [1] http://en.wikipedia.org/wiki/Persistent_data_structure > [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie >
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk