Hi,

I quite often have to deal with complex data structures, which structure I
have little info on. Typically this could be data created by XML::Simple,
from various XML files.

So I have written  a little module, Data::Traverse, that lets me either
extract data from a data structure (all scalars, or all references of a
certain type), or create iterators on the data structure.

Does this make sense? Is there something already available on CPAN that
would do just this (I looked in the Data namespace and did not see
anything)? Would it be worth releasing on CPAN?

An alpha version of the module is at
http://xmltwig.com/module/data-traverse/



SYNOPSIS
    Data::Traverse supports 2 modes: a simple procedural interface and an
    object-oriented interface.

  The procedural interface
    The procedural interface can be used to retrieve parts of a complex
    data structure.

    It is used through "use Data::Traverse qw(:lists)"

      use Data::Traverse qw(:lists);

      my $data= ...;                           # a complex data structure

      my @values= scalars( $data);             # all scalars in the structure
      my @values= refs( $data, 'LWP::Simple'); # all LWP::Simple objects

  The OO interface
    The OO interface is used to write iterators that go through a data
    structure.

      my $iter= Data::Traverse->new( $data);
      $iter->traverse( sub { my( $iter, $item)= @_;
                             print "$item\n" if( $iter->item_key eq 'id');
                             return $iter->prune
                               if( $iter->path_matches( '/foo/bar'));
                           }
                      );

      $iter->traverse( sub { $_[1]++ if( $_[0]->is_scalar); }); # changes the data

    More methods are available to get information on the current context

DESCRIPTION
    Data::Traverse lets you traverse complex data structures without
    needing to know all about them.

    It can be used for example with the data structure created by
    XML::Simple

  Procedural Interface
    refs ($data, $ref, $optional_level)
        return the list of references of the $ref type (as per
        "UNIVERSAL::isa( $field, $ref)") in the data structure, in the
        order of traversal (hashes are traversed through the dictionary
        order of their keys).

        The $optional_level argument can be used to limit the depth in the
        data structure, 0 being <$data> itself.

    scalars ($data, $optional_level)
        return the list of scalar values in the data structure, in the
        order of traversal (hashes are traversed through the dictionary
        order of their keys).

        The $optional_level argument can be used to limit the depth in the
        data structure, 0 being <$data> itself..

    refs_level ($data, $ref, $level)
        return the list of references to $ref at $level in the data
        structure.

    scalars_level ($data, $level)
        return the list of scalar values at $level in the data structure.

  Object-Oriented Interface
    The Object-Oriented interface provides iterators on arbitrary data
    structures. A handler is associated with the iterator and is called
    for every item in the data structure. Within the handler a host of
    methods can be called to get information about the current context.

    new ($data)
        create an iterator on $data

    traverse ($handler)
        traverse the data structure and apply the handler to all item of
        the data structure. An item is anything in the data structure,
        scalar, arrayref or hashref.

        the handler is called with the iterator and the current item as
        arguments. Use $_[1] if you want to update the original data
        structure.

        the handler also receives the item as $_

        in the handler you can use the following functions:

        path
            the current path to an item in the data structure is built
            from the hash keys to get to the item, joined with '"/"'.

        path_matches ($exp)
            $exp is a regular expression. The path is matched against that
            regexp.

        parent
            the parent of the current item (a hashref or arrayref that
            includes the item)

        ancestors
            the list (root first) of ancestors of the current item

        item_index
            if the item is an element of an array then this is the item
            index in the array

        item_key
            if the item is a value in a hash then this is the item key in
            the hash

        parent_item_key
            if the parent of the item is a value in a hash then this is
            the associated key

        data_level
            the level of depth at which the item is found in the data
            structure (the size of the ancestors stack)

        path_level
            the number of steps in the item path

        is_scalar
            return true if the item is a scalar (this is just "!ref" but
            reads better).

        prune
            if the handler returns prune then the children of the current
            item are not traversed

        finish
            ends the traversal and returns

BUGS/TODO
    At this point cycles in the data structure are not properly processed:

    - the procedural interface will most likely enter deep recursion,

    - the OO interface will only get once to each item, but then testing
the
    context will only occur once

    The procedural interface does a breadth-first traversal of the data,
the
    OO interface does a depth-first traversal, it would be nice to be able
    to choose the algorithm.

    More tests need to be written (isn't this always the case? ;--)

    It would be nice to have more generic methods to query the context
    (XPath-based?)


--
Michel Rodriguez
Perl &amp; XML
http://www.xmltwig.com

Reply via email to