On 02/09/2013 06:02 PM, tcb wrote:
> Hi,
>
> I am trying to read the graphml output of graph-tool's graphml using
> networkx.
>
>     https://github.com/networkx/networkx/issues/843
>
> Unfortunately this does not work with any of the vector_* type property maps
> which graph-tool uses. Have you encountered this issue before?

Yes, this is expected, because the graphml specification only defines
the following types: boolean, int, long, float, double, or string
If you want another type, you are out of luck.

> It seems the right thing to do might be to extend your graphml to hold the
> vector_* attributes as detailed:
>
>     http://graphml.graphdrawing.org/primer/graphml-primer.html#EXT
>
> Is there some reason why it was done the way it is? How do you manage
> read/writing graphml data to other tools?

Extending it this way would be the strictly "correct" approach. However,
it has two downsides: Firstly, it is much more cumbersome to
implement. Essentially, the reader must be aware of this whole xml
schema extension stuff, which currently it simply ignores. Secondly, it
does not really fix the problem of interoperability, it only punts
it. Two pieces of software would still need to agree and know about the
extension for it to work. In other words, you still would not be able to
make networkx read the vector types, unless the they modify their
reader. It seems to me that simply adding a nonstandard type is much
more straightforward, albeit "unclean" from the point of view of XML
validity.

Regarding reading data from other tools, there is no issue, since the
standard types are fully supported. If the user wants to feed graphml
data produced with graph-tool to other programs, then only the standard
types should be used.

> In the meantime, it might be possible to hack some read support for
> graph-tool's xml into networkx. To this end, could you please advise how to
> parse the 'key1' data (should be two floats)
>
> <node id="n1">
>   <data key="key0">6</data>
>   <data key="key1">0x1.5c71d0cb8d943p+3, 0x1.70db7f4083655p+3</data>
> </node>

The delimiter is a comma, and spaces should be ignored. The individual
values are encoded according to the %a format from C99. This is to
ensure exact binary representation. From the printf manpage:

   a, A   (C99;  not  in  SUSv2)  For a conversion, the double argument is 
converted to
          hexadecimal notation (using the letters abcdef) in the  style  
[-]0xh.hhhhp±;
          for  A conversion the prefix 0X, the letters ABCDEF, and the exponent 
separa‐
          tor P is used.  There is one hexadecimal digit before the decimal 
point,  and
          the  number of digits after it is equal to the precision.  The 
default preci‐
          sion suffices for an exact representation of the value if an exact  
represen‐
          tation  in  base  2 exists and otherwise is sufficiently large to 
distinguish
          values of type double.  The digit before the decimal point is 
unspecified for
          nonnormalized  numbers,  and nonzero but otherwise unspecified for 
normalized
          numbers.

I'm not sure there is any python function which can read this
automatically. You can do it with ctypes:

    >>> from ctypes import *
    >>> libc = cdll.LoadLibrary("libc.so.6")
    >>> d = c_double()
    >>> libc.sscanf(b"0x1.5c71d0cb8d943p+3", b"%a", byref(d))
    1
    >>> print(d)
    c_double(5.402846293e-315)

But this would not be the most portable approach... Otherwise you can
write a simple parser based on the format description above.

Please keep me informed on any progress on this. Interoperability with
other programs is important, so if there is anything I can do to help,
I'd be glad to do it. If the networkx people would like to consider a
common approach, I'm open for discussion.

Cheers,
Tiago

-- 
Tiago de Paula Peixoto <[email protected]>

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
graph-tool mailing list
[email protected]
http://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to