Hi, I quite often have to deal with complex data structures, which structure I have little info on. Typically this could be data created by XML::Simple, from various XML files.
So I have written a little module, Data::Traverse, that lets me either extract data from a data structure (all scalars, or all references of a certain type), or create iterators on the data structure. Does this make sense? Is there something already available on CPAN that would do just this (I looked in the Data namespace and did not see anything)? Would it be worth releasing on CPAN? An alpha version of the module is at http://xmltwig.com/module/data-traverse/ SYNOPSIS Data::Traverse supports 2 modes: a simple procedural interface and an object-oriented interface. The procedural interface The procedural interface can be used to retrieve parts of a complex data structure. It is used through "use Data::Traverse qw(:lists)" use Data::Traverse qw(:lists); my $data= ...; # a complex data structure my @values= scalars( $data); # all scalars in the structure my @values= refs( $data, 'LWP::Simple'); # all LWP::Simple objects The OO interface The OO interface is used to write iterators that go through a data structure. my $iter= Data::Traverse->new( $data); $iter->traverse( sub { my( $iter, $item)= @_; print "$item\n" if( $iter->item_key eq 'id'); return $iter->prune if( $iter->path_matches( '/foo/bar')); } ); $iter->traverse( sub { $_[1]++ if( $_[0]->is_scalar); }); # changes the data More methods are available to get information on the current context DESCRIPTION Data::Traverse lets you traverse complex data structures without needing to know all about them. It can be used for example with the data structure created by XML::Simple Procedural Interface refs ($data, $ref, $optional_level) return the list of references of the $ref type (as per "UNIVERSAL::isa( $field, $ref)") in the data structure, in the order of traversal (hashes are traversed through the dictionary order of their keys). The $optional_level argument can be used to limit the depth in the data structure, 0 being <$data> itself. scalars ($data, $optional_level) return the list of scalar values in the data structure, in the order of traversal (hashes are traversed through the dictionary order of their keys). The $optional_level argument can be used to limit the depth in the data structure, 0 being <$data> itself.. refs_level ($data, $ref, $level) return the list of references to $ref at $level in the data structure. scalars_level ($data, $level) return the list of scalar values at $level in the data structure. Object-Oriented Interface The Object-Oriented interface provides iterators on arbitrary data structures. A handler is associated with the iterator and is called for every item in the data structure. Within the handler a host of methods can be called to get information about the current context. new ($data) create an iterator on $data traverse ($handler) traverse the data structure and apply the handler to all item of the data structure. An item is anything in the data structure, scalar, arrayref or hashref. the handler is called with the iterator and the current item as arguments. Use $_[1] if you want to update the original data structure. the handler also receives the item as $_ in the handler you can use the following functions: path the current path to an item in the data structure is built from the hash keys to get to the item, joined with '"/"'. path_matches ($exp) $exp is a regular expression. The path is matched against that regexp. parent the parent of the current item (a hashref or arrayref that includes the item) ancestors the list (root first) of ancestors of the current item item_index if the item is an element of an array then this is the item index in the array item_key if the item is a value in a hash then this is the item key in the hash parent_item_key if the parent of the item is a value in a hash then this is the associated key data_level the level of depth at which the item is found in the data structure (the size of the ancestors stack) path_level the number of steps in the item path is_scalar return true if the item is a scalar (this is just "!ref" but reads better). prune if the handler returns prune then the children of the current item are not traversed finish ends the traversal and returns BUGS/TODO At this point cycles in the data structure are not properly processed: - the procedural interface will most likely enter deep recursion, - the OO interface will only get once to each item, but then testing the context will only occur once The procedural interface does a breadth-first traversal of the data, the OO interface does a depth-first traversal, it would be nice to be able to choose the algorithm. More tests need to be written (isn't this always the case? ;--) It would be nice to have more generic methods to query the context (XPath-based?) -- Michel Rodriguez Perl & XML http://www.xmltwig.com
