Re: Data flow on a wire
Jim Marino wrote: A couple of us have started to discuss this as well in relation to Celtix...My main concerns, which there appears to be agreement, are: 1. We are not instituting a canonical form model similar to JBI in the runtime. I think Jeremy stated this is not the case Having trouble parsing :-) I do not think we should have a canonical form model similar to JBI. I do think we should identify a set of common forms to reduce the number of format conversions we need to implement. 2. Local invokes - i.e. where semantics are pass-by-reference - honor that and do not have *parameters* mediated. This appears non-controversial. Let's be controversial then :-) The contract is pass-by-reference not pass-pointer-by-value - if the runtime chooses to mediate the pointer format then it is free to do so. This may enable us to support local calls across language - e.g. converting a Java object reference to a pointer in C++. It is the runtime's responsibility to make sure the memory model is not violated. 3. Mediation is done using handlers and intereceptors. I think we are also in agreement here. +1 I still have questions around how the container and bindings declare support for certain data binding formats. We need to come up with a design here. For example, would the following be a valid approach in your opinion (I haven't thought too much about it so it is kind of vague)? 1. Implementation types declare whether they support pass-by-ref. Since this is a Java runtime, pass-by-ref means object references in the VM. I'd modify that slightly to say the service contract determines if parameters are passed by reference. Any implementation offering that service contract must be able to support it. The type of runtime must not impact the service contract and the runtime must fail an attempt to deploy a contract that cannot be fulfilled. The type of message used by the runtime is currently unspecified - in our Java runtime we are defining the content format to be reference to any Java Object. 2. Bindings register themselves can be queried for what formats they support. They may delegate to some data binding service. I think we need to distinguish between transport binding and data binding. Transports should delegate all serialization to data bindings. 3. Tuscany may want to declare some common formats such as Java--SDO, SDO--, StaX stream--Java, Java--Stax, SDO--StaX, JAXB--Java etc. and have interceptors or handlers perform those transformations. These handlers could be registered with a wire builder and inserted into an invocation chain Declare to me implies something special. I think these are transforms that we include in our baseline profile - they are no different to ones provided by users. Yes, they register their availability and the wire builder selects them as appropriate when constructing the wire. 4. Implementation types and bindings declare what pass-by-value formats they support in order of preference. This may be what you meant but I think this depends on the wiring requirements. The implementation type shouldn't specify a general preference; instead it should say which ones are supported for a particular parameter and provide a relative cost for each. The wire builder calculates cost for source and target and selects the most efficient wire. Dan, I know you have a bunch of thoughts on this, so it would be interesting to discuss them in this thread. Please, this is key so the more perspectives the better. -- Jeremy
Re: Data flow on a wire
On Mar 22, 2006, at 9:32 AM, Jeremy Boynes wrote: Jim Marino wrote: A couple of us have started to discuss this as well in relation to Celtix...My main concerns, which there appears to be agreement, are: 1. We are not instituting a canonical form model similar to JBI in the runtime. I think Jeremy stated this is not the case Having trouble parsing :-) I do not think we should have a canonical form model similar to JBI. I do think we should identify a set of common forms to reduce the number of format conversions we need to implement. Yes. At first I read this as I *do* think.. which scared me :-) I need to check my vision again. We're in agreement here. 2. Local invokes - i.e. where semantics are pass-by-reference - honor that and do not have *parameters* mediated. This appears non- controversial. Let's be controversial then :-) The contract is pass-by-reference not pass-pointer-by-value - if the runtime chooses to mediate the pointer format then it is free to do so. Maybe I should have been clearer - by not mediate the parameters I mean that the runtime cannot violate pass by reference. For Java-- Java in particular, which I would guess will be at least 90% of local calls (which we should optimize for), the strategy for doing this should be passing references directly. For other situations I don't have an opinion other than it should be done in a handler or interceptor or extension and not in the core runtime. This may enable us to support local calls across language - e.g. converting a Java object reference to a pointer in C++. It is the runtime's responsibility to make sure the memory model is not violated. This to me is a nice to have sometime in the future but not something we should optimize for right now. 3. Mediation is done using handlers and intereceptors. I think we are also in agreement here. +1 I still have questions around how the container and bindings declare support for certain data binding formats. We need to come up with a design here. For example, would the following be a valid approach in your opinion (I haven't thought too much about it so it is kind of vague)? 1. Implementation types declare whether they support pass-by-ref. Since this is a Java runtime, pass-by-ref means object references in the VM. I'd modify that slightly to say the service contract determines if parameters are passed by reference. Any implementation offering that service contract must be able to support it. I think we need to follow the spec here. The type of runtime must not impact the service contract and the runtime must fail an attempt to deploy a contract that cannot be fulfilled. The type of message used by the runtime is currently unspecified - in our Java runtime we are defining the content format to be reference to any Java Object. Yes. What would you propose here? Also, could you provide a description of what happens when an invoke is done across two local Java services? 2. Bindings register themselves can be queried for what formats they support. They may delegate to some data binding service. I think we need to distinguish between transport binding and data binding. Transports should delegate all serialization to data bindings. Yes, I forgot to preface the first bindings with transport binding. 3. Tuscany may want to declare some common formats such as Java-- SDO, SDO--, StaX stream--Java, Java--Stax, SDO--StaX, JAXB-- Java etc. and have interceptors or handlers perform those transformations. These handlers could be registered with a wire builder and inserted into an invocation chain Declare to me implies something special. I think these are transforms that we include in our baseline profile - they are no different to ones provided by users. Yes, they register their availability and the wire builder selects them as appropriate when constructing the wire. Declare = register, nothing more. They are just extensions included in the baseline. We do need some way of naming them though. 4. Implementation types and bindings declare what pass-by-value formats they support in order of preference. This may be what you meant but I think this depends on the wiring requirements. The implementation type shouldn't specify a general preference; instead it should say which ones are supported for a particular parameter and provide a relative cost for each. The wire builder calculates cost for source and target and selects the most efficient wire. Relative cost is a preference isn't it? The implementation type bases this preference on a selfish calculation since it does not know what the source type is. What's the difference? Dan, I know you have a bunch of thoughts on this, so it would be interesting to discuss them in this thread. Please, this is key so the more perspectives the better. -- Jeremy
Re: Data flow on a wire
Re-posted since the previous one is missing the diagram Hi,I think I have an interesting picture for this topic.1) The data transformation capabilities for various databindings can be nicely modeled as a weighted, directed graph with the following rules. (Illustrated in the attached diagram).a. Each databinding is mapped to a vertex.b. If databinding A can be transformed to databinding B, then an edge will be added from vertex A to vertex B.c. The weight of the edge is the cost of the transformation from the source to the sink.2) In the data mediator/interceptor on the wire, if we find out that the data needs to be transformed from databinding A to databinding E. Then we can apply Dijkstra's Shortest Path Algorithm to the graph and figure the most performed path. It can be A--E, or A--C--E depending on the weights. If no path can be found, then the data cannot be mediated.Any thoughts?Thanks,Raymond - Original Message - From: "Jeremy Boynes" [EMAIL PROTECTED] To: tuscany-dev@ws.apache.org Sent: Wednesday, March 15, 2006 3:37 PM Subject: Data flow on a wire A couple of us had an offline chat about what the format of data would be exchanged on the wire during an interaction between a client and a provider. The spur for this was the JSON binding Ant was working on which has no obvious affinity to XML. Another issue related to this has been about supporting streaming types for interactions where data flows through a system rather than terminating there. This is related to Axiom and its use for precisely this purpose in Axis2. I wanted to capture thoughts whilst still current and open the discussion. As I see it there is no single answer to this, well apart from "it depends." :-) I think it is necessary for us to support the flow of any data type that is supported by both the client and the provider. With the ability to attach data transformation mediations to wires, this actually becomes a requirement to support any data type that can be mapped from client to provider and back again. In any interchange there are just two things that are defined: the format of data that will be supplied by the client and the format of data that will be consumed (delivered to) the provider. Neither client or provider needs to be aware of the format of data on the other end or of what gyrations the fabric went though in order to make the connection. As part of making the connection, it is the fabric's job to make the connection as efficient as possible, factoring in the semantic meaning of the data, the policies that need to be applied, and what the different containers support. All this flexibility just about requires we use the most generic type possible to hold the data being exchanged: a java.lang.Object or a (void*) depending on the runtime. The actual instance used would depend on the actual wire, some examples from Java land being: * POJO (for local pass by reference) * SDO (when supplied by the application) * Axiom OMElement (for the Axis2 binding) * StAX XMLStreamReader (for streamed access to a XML infoset) * ObjectInputStream (for cross-classloader serialization) and so forth. Each container and transport binding just needs to declare which data formats it can support for each endpoint it manages. The wiring framework need to know about these formats and about what transformations can be engaged in the wire pipeline. For example, the Axis2 transport may declare that it can support Axiom and StAX for a certain port and the Java container may declare that it can only handle SDOs for an implementation that expects to be passed a DataObject. The wiring framework can resolve this by adding a StAX-SDO transform into the pipeline. The limitation here is whether a transformation can be constructed to match the formats on either end. If one exists then great, but as the number increases then developing n-squared transforms becomes impractical. A better approach would be to pick the most common formats and require bindings and containers to support those at a minimum, with other point-to-point transforms being added as warranted. Given the flow issue descibed above and the XML nature of many our interactions I would suggest that a StAX XMLStreamReader may be the most apporpriate common format (at least for now). It's native to Axis2 and Raymond has posted patches already to support it in SDO. Alternatively, we don't need all of StAX for this to work so it may be simpler to provide a basic API that is essentially the same as an XMLStreamReader but without all the other stuff. Thanks for reading this far. The idea was to capture thinking and all input is welcome. -- Jeremy
Re: Data flow on a wire
Sorry, the attachment cannot go through. I added it to the wiki page @ http://wiki.apache.org/ws/Tuscany/DataMediation. Thanks, Raymond - Original Message - From: Jeremy Boynes [EMAIL PROTECTED] To: tuscany-dev@ws.apache.org Sent: Wednesday, March 15, 2006 3:37 PM Subject: Data flow on a wire A couple of us had an offline chat about what the format of data would be exchanged on the wire during an interaction between a client and a provider. The spur for this was the JSON binding Ant was working on which has no obvious affinity to XML. Another issue related to this has been about supporting streaming types for interactions where data flows through a system rather than terminating there. This is related to Axiom and its use for precisely this purpose in Axis2. I wanted to capture thoughts whilst still current and open the discussion. As I see it there is no single answer to this, well apart from it depends. :-) I think it is necessary for us to support the flow of any data type that is supported by both the client and the provider. With the ability to attach data transformation mediations to wires, this actually becomes a requirement to support any data type that can be mapped from client to provider and back again. In any interchange there are just two things that are defined: the format of data that will be supplied by the client and the format of data that will be consumed (delivered to) the provider. Neither client or provider needs to be aware of the format of data on the other end or of what gyrations the fabric went though in order to make the connection. As part of making the connection, it is the fabric's job to make the connection as efficient as possible, factoring in the semantic meaning of the data, the policies that need to be applied, and what the different containers support. All this flexibility just about requires we use the most generic type possible to hold the data being exchanged: a java.lang.Object or a (void*) depending on the runtime. The actual instance used would depend on the actual wire, some examples from Java land being: * POJO (for local pass by reference) * SDO (when supplied by the application) * Axiom OMElement (for the Axis2 binding) * StAX XMLStreamReader (for streamed access to a XML infoset) * ObjectInputStream (for cross-classloader serialization) and so forth. Each container and transport binding just needs to declare which data formats it can support for each endpoint it manages. The wiring framework need to know about these formats and about what transformations can be engaged in the wire pipeline. For example, the Axis2 transport may declare that it can support Axiom and StAX for a certain port and the Java container may declare that it can only handle SDOs for an implementation that expects to be passed a DataObject. The wiring framework can resolve this by adding a StAX-SDO transform into the pipeline. The limitation here is whether a transformation can be constructed to match the formats on either end. If one exists then great, but as the number increases then developing n-squared transforms becomes impractical. A better approach would be to pick the most common formats and require bindings and containers to support those at a minimum, with other point-to-point transforms being added as warranted. Given the flow issue descibed above and the XML nature of many our interactions I would suggest that a StAX XMLStreamReader may be the most apporpriate common format (at least for now). It's native to Axis2 and Raymond has posted patches already to support it in SDO. Alternatively, we don't need all of StAX for this to work so it may be simpler to provide a basic API that is essentially the same as an XMLStreamReader but without all the other stuff. Thanks for reading this far. The idea was to capture thinking and all input is welcome. -- Jeremy
Re: Data flow on a wire
On Mar 22, 2006, at 10:10 AM, Jim Marino wrote: On Mar 22, 2006, at 9:32 AM, Jeremy Boynes wrote: Jim Marino wrote: A couple of us have started to discuss this as well in relation to Celtix...My main concerns, which there appears to be agreement, are: 1. We are not instituting a canonical form model similar to JBI in the runtime. I think Jeremy stated this is not the case Having trouble parsing :-) I do not think we should have a canonical form model similar to JBI. I do think we should identify a set of common forms to reduce the number of format conversions we need to implement. Yes. At first I read this as I *do* think.. which scared me :-) I need to check my vision again. We're in agreement here. 2. Local invokes - i.e. where semantics are pass-by-reference - honor that and do not have *parameters* mediated. This appears non-controversial. Let's be controversial then :-) The contract is pass-by-reference not pass-pointer-by-value - if the runtime chooses to mediate the pointer format then it is free to do so. Maybe I should have been clearer - by not mediate the parameters I mean that the runtime cannot violate pass by reference. For Java--Java in particular, which I would guess will be at least 90% of local calls (which we should optimize for), the strategy for doing this should be passing references directly. For other situations I don't have an opinion other than it should be done in a handler or interceptor or extension and not in the core runtime. This may enable us to support local calls across language - e.g. converting a Java object reference to a pointer in C++. It is the runtime's responsibility to make sure the memory model is not violated. This to me is a nice to have sometime in the future but not something we should optimize for right now. 3. Mediation is done using handlers and intereceptors. I think we are also in agreement here. +1 I still have questions around how the container and bindings declare support for certain data binding formats. We need to come up with a design here. For example, would the following be a valid approach in your opinion (I haven't thought too much about it so it is kind of vague)? 1. Implementation types declare whether they support pass-by-ref. Since this is a Java runtime, pass-by-ref means object references in the VM. I'd modify that slightly to say the service contract determines if parameters are passed by reference. Any implementation offering that service contract must be able to support it. I think we need to follow the spec here. The type of runtime must not impact the service contract and the runtime must fail an attempt to deploy a contract that cannot be fulfilled. The type of message used by the runtime is currently unspecified - in our Java runtime we are defining the content format to be reference to any Java Object. Yes. What would you propose here? Also, could you provide a description of what happens when an invoke is done across two local Java services? 2. Bindings register themselves can be queried for what formats they support. They may delegate to some data binding service. I think we need to distinguish between transport binding and data binding. Transports should delegate all serialization to data bindings. Yes, I forgot to preface the first bindings with transport binding. 3. Tuscany may want to declare some common formats such as Java--SDO, SDO--, StaX stream--Java, Java--Stax, SDO--StaX, JAXB--Java etc. and have interceptors or handlers perform those transformations. These handlers could be registered with a wire builder and inserted into an invocation chain Declare to me implies something special. I think these are transforms that we include in our baseline profile - they are no different to ones provided by users. Yes, they register their availability and the wire builder selects them as appropriate when constructing the wire. Declare = register, nothing more. They are just extensions included in the baseline. We do need some way of naming them though. 4. Implementation types and bindings declare what pass-by-value formats they support in order of preference. This may be what you meant but I think this depends on the wiring requirements. The implementation type shouldn't specify a general preference; instead it should say which ones are supported for a particular parameter and provide a relative cost for each. The wire builder calculates cost for source and target and selects the most efficient wire. Relative cost is a preference isn't it? The implementation type bases this preference on a selfish calculation since it does not know what the source type is. What's the difference? Jeez, I'm having trouble reading today - I also didn't see per parameter vs. general. Sorry and
Re: Data flow on a wire
On Mar 15, 2006, at 3:37 PM, Jeremy Boynes wrote: A couple of us had an offline chat about what the format of data would be exchanged on the wire during an interaction between a client and a provider. The spur for this was the JSON binding Ant was working on which has no obvious affinity to XML. Another issue related to this has been about supporting streaming types for interactions where data flows through a system rather than terminating there. This is related to Axiom and its use for precisely this purpose in Axis2. I wanted to capture thoughts whilst still current and open the discussion. As I see it there is no single answer to this, well apart from it depends. :-) I think it is necessary for us to support the flow of any data type that is supported by both the client and the provider. With the ability to attach data transformation mediations to wires, this actually becomes a requirement to support any data type that can be mapped from client to provider and back again. In any interchange there are just two things that are defined: the format of data that will be supplied by the client and the format of data that will be consumed (delivered to) the provider. Neither client or provider needs to be aware of the format of data on the other end or of what gyrations the fabric went though in order to make the connection. As part of making the connection, it is the fabric's job to make the connection as efficient as possible, factoring in the semantic meaning of the data, the policies that need to be applied, and what the different containers support. All this flexibility just about requires we use the most generic type possible to hold the data being exchanged: a java.lang.Object or a (void*) depending on the runtime. The actual instance used would depend on the actual wire, some examples from Java land being: * POJO (for local pass by reference) * SDO (when supplied by the application) * Axiom OMElement (for the Axis2 binding) * StAX XMLStreamReader (for streamed access to a XML infoset) * ObjectInputStream (for cross-classloader serialization) and so forth. Each container and transport binding just needs to declare which data formats it can support for each endpoint it manages. The wiring framework need to know about these formats and about what transformations can be engaged in the wire pipeline. For example, the Axis2 transport may declare that it can support Axiom and StAX for a certain port and the Java container may declare that it can only handle SDOs for an implementation that expects to be passed a DataObject. The wiring framework can resolve this by adding a StAX- SDO transform into the pipeline. The limitation here is whether a transformation can be constructed to match the formats on either end. If one exists then great, but as the number increases then developing n-squared transforms becomes impractical. A better approach would be to pick the most common formats and require bindings and containers to support those at a minimum, with other point-to-point transforms being added as warranted. This seems kind of like JBI. A question here is whether a normalized form is really practical and whether it is the easiest thing to do. Also, is mediation even the concern of the runtime? Should the runtime just make it possible to do mediation and delegate to a mediator interceptor/handler or create an implementation type that is a mediation component? Also, what about local invoke? I assume a container would have to declare support of primitives and Object? I think it may just be easier to settle on Object as the common form. Given the flow issue descibed above and the XML nature of many our interactions I would suggest that a StAX XMLStreamReader may be the most apporpriate common format (at least for now). It's native to Axis2 and Raymond has posted patches already to support it in SDO. Again, what about local invocations or things that just require simple serialization over a socket? Alternatively, we don't need all of StAX for this to work so it may be simpler to provide a basic API that is essentially the same as an XMLStreamReader but without all the other stuff. Thanks for reading this far. The idea was to capture thinking and all input is welcome. -- Jeremy