Hey Matt,

Okay, there is a way to do this.  libxml defines a global locator object:

/**
 * xmlSAXLocator:
 *
 * A SAX Locator.
 */
struct _xmlSAXLocator {
    const xmlChar *(*getPublicId)(void *ctx);
    const xmlChar *(*getSystemId)(void *ctx);
    int (*getLineNumber)(void *ctx);
    int (*getColumnNumber)(void *ctx);
};

There is one per-thread.

So the one problem - the bindings don't currently expose this object. How are your c skills, want to put together a patch?

Charlie

Charlie Savage wrote:
Hi Charlie,

That's exactly what I mean. Is that possible?

What I'd like to do is, parse an xml file and store each node in a flat search index. I want to store the string start/end position with the stored item so when the search returns the item, I can load only that fragment of xml. Some of the xml files I'm dealing with are like 13 MB.

Does that make sense?

Yup. And you can sort of kind of do it. XML::Reader surfaces line_no and column_number methods, but that pertain to where the parser is in the file and not where the elements are - so that won't work. However, libxml does define XML::Node#line_num. So you can figure out on what line an element starts.

To do that:

  def test_node
    XML.default_line_numbers = true
    reader = XML::Reader.file(XML_FILE)

    while reader.read
      puts reader.name
      puts reader.node.line_num
    end
  end

The downsides:

* Reader#node is a new method just added, so you'll need to pull a build from trunk

* You don't get the column number - I don't see an api for that unfortunately (if you see one in libxml let me know)

* Node#line_num returns the starting line number when a node ends (not the line where the node </ends>

So this will give you a rough idea of where things are, but not an exact idea. Is that good enough?

Charlie


------------------------------------------------------------------------

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

--
Charlie Savage
http://cfis.savagexi.com

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to