Bugs item #22531, was opened at 2008-10-23 15:06 You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22531&group_id=494
Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: Han Holl (hanholl) Assigned to: Nobody (None) Summary: XML::Reader does not work with sockets Initial Comment: I tried with the two following programs: A server: #!/usr/bin/ruby -w require 'rubygems' require 'xml' require 'socket' server = TCPServer.new(22222) while session = server.accept reader = XML::Reader.io(session) loop do rsl = reader.read puts rsl break if rsl != 1 puts reader.expand end session.puts 'ok' sleep 1 session.close end # end-of-server and a client: #!/usr/bin/ruby require 'socket' t = TCPSocket.new('localhost', 22222) t.puts '<doc><a>k</a></doc>' puts t.gets # end-of-client Output from server: Entity: line 1: parser error : Extra content at the end of the document ^ -1 I tried different platforms (RH9, CentOS 5.1 and Fedora8). libxml-ruby-0.8.3 Various libxml2 versions ---------------------------------------------------------------------- >Comment By: Han Holl (hanholl) Date: 2008-11-24 23:02 Message: Charlie, Thanks a lot for your efforts. I'm afraid as it is XML::Reader cannot be used to read XML nodes from a stream. In my original attempt I used TCPSocket, wich is derived from IO and therefore implements def read(len). I have created a wrapper around this to make clear er what happens, but the result remains the same: The library first want 4 bytes and then 4096, and waits until it gets them (or EOF). I suppose in the wrapper I could read 1 byte at a time, and try to determine myself when I have a complete node but the result would be almost as cumbersome as my current SAX based implementation. Well, maybe slightly less cumbersome. PS I tried it and even then it ask for 4096 bytes more. Pity. Thanks for your efforts. Han ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-24 20:10 Message: Hi Han, Look at ruby_xml_input.c. The relevant code is this: /* This method is called by libxml when it wants to read more data from a stream. We go with the duck typing solution to support StringIO objects. */ int rxml_read_callback(void *context, char *buffer, int len) { VALUE io = (VALUE)context; VALUE string = rb_funcall(io, READ_METHOD, 1, INT2NUM(len)); int size; if(string == Qnil) return 0; size = RSTRING_LEN(string); memcpy(buffer, StringValuePtr(string), size); return size; } What this means: 1. Pass an io object to libxml (reader = Reader.io(my_io)) 2. When it needs more data, libxml will call: my_io.read(<length) 3. my_io should implement def read(length) end Make more sense? I agree with you - I'd hope libxml only calls read when it needs more data (assumedly have you've called reader.read). ---------------------------------------------------------------------- Comment By: Han Holl (hanholl) Date: 2008-11-24 11:32 Message: I don't quite understand. I hoped, reading the libxml2 docs, that I could read a node from a stream. IOW that the reader would return as soon as it saw that it had received a complete node. I don't know the length of the next node, the producer knows Isn't this the relevant libxml2 function? int xmlTextReaderRead (xmlTextReaderPtr reader) Moves the position of the current instance to the next node in the stream, exposing its properties. reader: the xmlTextReaderPtr used Returns: 1 if the node was read successfully, 0 if there is no more nodes to read, or -1 in case of error ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-24 02:39 Message: Yes, I would assume this would block as you read more data from the socket. But I would hope you could incrementally process data. One thing I missed was that the read method should take a length. Libxml will tell you how much data it wants, you can't return more. That makes the interface a bit harder to deal with, unless you can read x bytes off the socket. So does setting the length help? ---------------------------------------------------------------------- Comment By: Han Holl (hanholl) Date: 2008-11-22 14:52 Message: Hello Charlie, Thanks a lot for this effort. Unfortunately it doesn't yet work for me. I read the libxml2 C docs and had the impression that XML::Reader.read should return as soon as it had read a complete XML node. If you have a look at the example I opened this discussion with, here's a strace of the server. read(4, "<doc><a>k</a></doc>", 4096) = 19 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 read(4, "\n", 4096) = 1 read(4, And there it hangs hungry for more. Maybe I've got the documention wrong? Cheers, Han Holl ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-22 10:46 Message: Hi Han, Ok, upgrade to the latest libxml. Then use: reader = Reader.io(io_object) That io object has to respond to read. So: def SocketIO def new(server) @server = server end def read server.accept end end And that should do the trick. ---------------------------------------------------------------------- Comment By: Han Holl (hanholl) Date: 2008-11-17 13:16 Message: I wouldn't know where to begin. Before I tried I had a look at the code, ruby_xml_reader_new_io(int argc, VALUE *argv, VALUE self), and had the strong impression that it _was_ implemented. I have no idea what is missing. C and ruby extensions are by no means my strong suit, I prefer ruby <g>. My C is read-only, with the exception of the occasional small patch ---------------------------------------------------------------------- Comment By: Charlie Savage (cfis) Date: 2008-11-16 00:45 Message: Yes, that is not going to work. Libxml does provide its own socket implementation, but that is not exposed via the ruby bindings. Want to put together a patch? ---------------------------------------------------------------------- You can respond by visiting: http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22531&group_id=494 _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel