[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303702#comment-14303702 ]
Marcelo Vanzin edited comment on SPARK-5388 at 2/3/15 7:48 PM: --------------------------------------------------------------- HI [~pwendell], Let me try to write a point-by-point feedback for the current spec. h4. Public protocol or not? If this is not supposed to be public (i.e., we don't expect someone to try to directly try to talk to the Spark master, it's always going to happen through the Spark libraries), then the underlying protocol is less important, since we only care about different versions being compatible in some way. Assuming a non-public protocol, my question would be: why implement your own RPC framework? Why not reuse something that's already there? For example, Avro has a stable serialization infrastructure that defines semantics for versioned data, and works well on top of HTTP. If handles serialization and dispatching - which would remove a lot of code from the current patch, and probably has other features that the current, "cluster-mode" only protocol doesn't need but other future uses might. h4. Non-submission uses Similarly, in the non-public protocol scenario, a proper REST-based API would look like overkill. But a proper REST infrastructure provides interesting room for growth of the master's public-facing API. For example, you could easily expose an endpoint for listing the current applications being tracked by the master, or an endpoint to kill an application. The former could benefit, also, the history server, which could expose the same API to list the applications it has found. h4. Evolvability and Versioning The current spec does not specify the behavior of the cluster nor the client with regards to different versions of the protocol. It has a table that basically says "future versions need to be able to submit standalone cluster applications to a 1.3 master", but it doesn't explain what that means or how that happens. Does it mean that, after 1.3, you can't ever change any of the messages used to launch a standalone cluster app, nor can you add new messages? Or, if that's allowed, what happens on the server side if it sees a field it doesn't understand? Does it ignore it, which could potentially break the application being submitted? Does it throw an error, in which case the client should make sure to submit an older version of the data structures if that's compatible with the app being submitted? If the latter, how does it know which version to use? As an example of how you could do this "negotiation": the client checks what features the app being submitted needs, and chooses the oldest supported api version based on that. It then can submit the request to, e.g., "/v2" and, if submitting to a 1.3 cluster, it will fail, because it doesn't support the features needed by that app. Also, thinking about the framework, what if later you need different features than the ones provided now? What if you need to use query params, path params, or non-json request bodies (e.g. for uploading files)? Are you gonna extend the current framework to the point where it starts looking like other existing ones? Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for when something goes wrong? Should clients only bother about a response if the status is "200 OK", or should they try to interpret the body of a "500 Internal Server Error" message or "401 Bad Request"? Those things need to be specified. h4. Others If the suggestions above don't sound particularly interesting for this use case, I'd strongly suggest, in the very least, removing any mention of REST from the spec and the code, because this is not a REST protocol in any way. Also, a question: if it's an HTTP protocol, why not expose it through the existing http port? To reply to the questions about my suggestions for how to use REST: * when you add a new version, you don't remove old ones. Spark v1.4 could add "/v2", but it must still support "/v1" in the way that it was specified. * as for new fields / types, that really depends on how you specify things. Personally, I like to declare a released API "frozen": you can't add new types, fields, or anything that the old release doesn't know about. Any new thing requires a new protocol version. But you could take a different approach, by adding optional fields that don't cause breakages when submitted to the old server that doesn't know about them. Again, these choices need to be specified up front, otherwise the implementation of v1 becomes the spec, since where the spec is not clear, the choices made by the implementation will become a de facto specification. (BTW, especially with a v1, the implementation will invariably become a "de facto" specification, that's unavoidable. But it helps to have the spec clearly cover all areas, so that hopefully you don't need to reverse-engineer the implementation code to figure out how things work.) Anyway, hope this is useful and clarifies some of the concerns I have about the current spec. was (Author: vanzin): HI [~pwendell], Let me try to write a point-by-point feedback for the current spec. h4. Public protocol or not? If this is not supposed to be public (i.e., we don't expect someone to try to directly try to talk to the Spark master, it's always going to happen through the Spark libraries), then the underlying protocol is less important, since we only care about different versions being compatible in some way. Assuming a non-public protocol, my question would be: why implement your own RPC framework? Why not reuse something that's already there? For example, Avro has a stable serialization infrastructure that defines semantics for versioned data, and works well on top of HTTP. If handles serialization and dispatching - which would remove a lot of code from the current patch, and probably has other features that the current, "cluster-mode" only protocol doesn't need but other future uses might. h4. Non-submission uses Similarly, in the non-public protocol scenario, a proper REST-based API would look like overkill. But a proper REST infrastructure provides interesting room for growth of the master's public-facing API. For example, you could easily expose an endpoint for listing the current applications being tracked by the master, or an endpoint to kill an application. The former could benefit, also, the history server, which could expose the same API to list the applications it has found. h4. Evolvability and Versioning The current spec does not specify the behavior of the cluster nor the client with regards to different versions of the protocol. It has a table that basically says "future versions need to be able to submit standalone cluster applications to a 1.3 master", but it doesn't explain what that means or how that happens. Does it mean that, after 1.3, you can't ever change any of the messages used to launch a standalone cluster app, nor can you add new messages? Or, if that's allowed, what happens on the server side if it sees a field it doesn't understand? Does it ignore it, which could potentially break the application being submitted? Does it throw an error, in which case the client should make sure to submit an older version of the data structures if that's compatible with the app being submitted? If the latter, how does it know which version to use? As an example of how you could do this "negotiation": the client checks what features the app being submitted needs, and chooses the oldest supported api version based on that. It then can submit the request to, e.g., "/v2" and, if submitting to a 1.3 cluster, it will fail, because it doesn't support the features needed by that app. Also, thinking about the framework, what if later you need different features than the ones provided now? What if you need to use query params, path params, or non-json request bodies (e.g. for uploading files)? Are you gonna extend the current framework to the point where it starts looking like other existing ones? Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for when something goes wrong? Should clients only bother about a response if the status is "200 OK", or should they try to interpret the body of a "500 Internal Server Error" message or "401 Bad Request"? Those things need to be specified. h4. Others If the suggestions above don't sound particularly interesting for this use case, I'd strongly suggest, in the very least, removing any mention of REST from the spec and the code, because this is not a REST protocol in any way. Also, a question: if it's an HTTP protocol, why not expose it through the existing http port? To reply to the questions about my suggestions for how to use REST: * when you add a new version, you don't remove old ones. Spark v1.4 could add "/v2", but it must still support "/v1" in the way that it was specified. * as for new fields / types, that really depends on how you specify things. Personally, I like to declare a released API "frozen": you can't add new types, fields, or anything that the old release doesn't know about. Any new thing requires a new protocol version. But you could take a different approach, by adding optional fields that don't cause breakages when submitted to the old server that doesn't know about them. Again, these choices need to be specified up front, otherwise the implementation of v1 becomes the spec, since where the spec is not clear, the choices made by the implementation will become a de facto specification. (BTW, especially with a v1, the implementation will invariably become a "de facto" specification, that's unavoidable. But it helps to help the spec clearly cover all areas, so that hopefully you don't need to reverse-engineer the implementation code to figure out how things work.) Anyway, hope this is useulf and clarifies some of the concerns I have about the current spec. > Provide a stable application submission gateway in standalone cluster mode > -------------------------------------------------------------------------- > > Key: SPARK-5388 > URL: https://issues.apache.org/jira/browse/SPARK-5388 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0 > Reporter: Andrew Or > Assignee: Andrew Or > Priority: Blocker > Attachments: Stable Spark Standalone Submission.pdf > > > The existing submission gateway in standalone mode is not compatible across > Spark versions. If you have a newer version of Spark submitting to an older > version of the standalone Master, it is currently not guaranteed to work. The > goal is to provide a stable REST interface to replace this channel. > The first cut implementation will target standalone cluster mode because > there are very few messages exchanged. The design, however, should be general > enough to potentially support this for other cluster managers too. Note that > this is not necessarily required in YARN because we already use YARN's stable > interface to submit applications there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org