OK, I see the problem now.

There's a text representation:

http://code.google.com/apis/protocolbuffers/docs/overview.html

# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
 name: "John Doe"
 email: "[email protected]"
}


But this is not what actually goes over the wire. And what goes over the wire does not include the names. To quote the above overview:

XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the |.proto| file).

So I agree with John. The native PB format doesn't contain the names we need to query them. The text format does, but this is not the native format.

Jonathan

John O'Hara wrote:
"The exchange then opens the protocol spec file and determines the key
number of the field we want."
So the broker needs to have access to the application message definitions at
runtime?

That was at the heart of the question I asked...

Cheers
John

2009/2/22 Joshua Kramer <[email protected]>

I agree that it's preferable to have one exchange handle multiple message
types, as it reduces code maintenance.  Here are a few relevant questions
that I thought of.

Does XQuilla require the entire XML document, or can its document
projection feature tell you how much of the PB message you have to XML-ify
before it can get a valid match?

How much value would a speed increase provide on structured data?  Does
this discussion have practical value, or is it just a science experiment?

If I were to implement a PB exchange, I would do so in the following
manner, after having read the documentation on message formats (
http://code.google.com/apis/protocolbuffers/docs/encoding.html).  I don't
think I would implement a new query language; that doesn't make sense for
the reasons you outlined.  I don't think it would be difficult to implement
an XPath mechanism, if it contained a subset of XPath.

A. On exchange subscription, the client gives the exchange an XPath query.
 The exchange then opens the protocol spec file and determines the key
number of the field we want.  If the requested element is at the top level,
the rule is simple: field n equals value y.  If the requested element is one
or more levels deep, we build a rule chain: first, get field n; then, get
field o; then, get field p.  (Field p is in the object that is field o,
which in turn is in the object that is field n).

B. When we get a message:
     1. If the MSB = 0 AND we are not working on something, skip to next
byte.
     2. If the MSB = 1:
           i. right-shift 1 step.  Does the field number equal the one
we're referencing?  If yes:
                 a. Get the data and move forward the number of bytes
specified by the type and length.
                 b. Test the data to see if it matches the right of the =
in the XPath.  If yes, route the message as specified by the subscription
and return.
                 c. If the XPath rule is a chain, and the data matches this
link, then repeat from step 2 on this particular object using the next link
in the rulechain.
                 d. Goto 1.
           ii. If no:
                 a. Skip the number of bytes specified in the type and
length.
                 b. Goto 1.

It seems that this would use fewer CPU cycles than XML-ifying the entire
message, and that doesn't include running the resulting data through
XQzilla.

Having said that - it may be necessary to use the full set of XPath
functionality, and in that case we'd have to XMLify the message.

Thoughts?

Cheers,
-Josh


Jonathan Robie wrote:

Joshua Kramer wrote:

Jonathan Robie wrote:

There is a reflection API for protocol buffers that would allow you to
easily create an XML representation:

Good thoughts, Jonathan.  I hadn't considered doing it that way before.
 Here's a question, though... how many CPU cycles would your method take,
compared to modifying XQuilla (or creating our own query mechanism) to
directly route the messages as they exist in the wire format they enter the
broker?  One of the primary benefits of using PB with QPid is the speed with
which structured data may be processed.

I rather suspect that the difference in processing time would be much
smaller than the overhead of reading the message, but this is something best
found by trying it and measuring it, then optimizing. If we can get good
enough performance, I see a real advantage to using one exchange type for
XML, Protocol Buffers, and JSON, and using the same language to specify
criteria for all three.

If we create our own query mechanism, we wind up creating our own query
language, I've done this a few times in different settings, and it takes
work to get it right. And it would be a language used by a very small
community. If we use a standard structured query language, XQuery seems to
be the main contender.

XQilla can query many kinds of input - a Xerces DOM tree, an istream
(which requires serialized XML),  a SAX Stream, among others. It probably
optimizes best for an istream, because it does "document projection", which
means that it does not parse the entire document if the query clearly
requires only part of the document.  This is of most benefit when the
message content is large.

Jonathan

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]





---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to