[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303702#comment-14303702
 ] 

Marcelo Vanzin edited comment on SPARK-5388 at 2/3/15 7:48 PM:
---------------------------------------------------------------

HI [~pwendell],

Let me try to write a point-by-point feedback for the current spec.

h4. Public protocol or not?

If this is not supposed to be public (i.e., we don't expect someone to try to 
directly try to talk to the Spark master, it's always going to happen through 
the Spark libraries), then the underlying protocol is less important, since we 
only care about different versions being compatible in some way.

Assuming a non-public protocol, my question would be: why implement your own 
RPC framework? Why not reuse something that's already there? For example, Avro 
has a stable serialization infrastructure that defines semantics for versioned 
data, and works well on top of HTTP. If handles serialization and dispatching - 
which would remove a lot of code from the current patch, and probably has other 
features that the current, "cluster-mode" only protocol doesn't need but other 
future uses might.

h4. Non-submission uses

Similarly, in the non-public protocol scenario, a proper REST-based API would 
look like overkill. But a proper REST infrastructure provides interesting room 
for growth of the master's public-facing API. For example, you could easily 
expose an endpoint for listing the current applications being tracked by the 
master, or an endpoint to kill an application. The former could benefit, also, 
the history server, which could expose the same API to list the applications it 
has found.

h4. Evolvability and Versioning

The current spec does not specify the behavior of the cluster nor the client 
with regards to different versions of the protocol. It has a table that 
basically says "future versions need to be able to submit standalone cluster 
applications to a 1.3 master", but it doesn't explain what that means or how 
that happens.

Does it mean that, after 1.3, you can't ever change any of the messages used to 
launch a standalone cluster app, nor can you add new messages? Or, if that's 
allowed, what happens on the server side if it sees a field it doesn't 
understand? Does it ignore it, which could potentially break the application 
being submitted? Does it throw an error, in which case the client should make 
sure to submit an older version of the data structures if that's compatible 
with the app being submitted? If the latter, how does it know which version to 
use?

As an example of how you could do this "negotiation": the client checks what 
features the app being submitted needs, and chooses the oldest supported api 
version based on that. It then can submit the request to, e.g., "/v2" and, if 
submitting to a 1.3 cluster, it will fail, because it doesn't support the 
features needed by that app.

Also, thinking about the framework, what if later you need different features 
than the ones provided now? What if you need to use query params, path params, 
or non-json request bodies (e.g. for uploading files)? Are you gonna extend the 
current framework to the point where it starts looking like other existing ones?

Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for 
when something goes wrong? Should clients only bother about a response if the 
status is "200 OK", or should they try to interpret the body of a "500 Internal 
Server Error" message or "401 Bad Request"? Those things need to be specified.

h4. Others

If the suggestions above don't sound particularly interesting for this use 
case, I'd strongly suggest, in the very least, removing any mention of REST 
from the spec and the code, because this is not a REST protocol in any way.

Also, a question: if it's an HTTP protocol, why not expose it through the 
existing http port?


To reply to the questions about my suggestions for how to use REST:

* when you add a new version, you don't remove old ones. Spark v1.4 could add 
"/v2", but it must still support "/v1" in the way that it was specified.
* as for new fields / types, that really depends on how you specify things. 
Personally, I like to declare a released API "frozen": you can't add new types, 
fields, or anything that the old release doesn't know about. Any new thing 
requires a new protocol version. But you could take a different approach, by 
adding optional fields that don't cause breakages when submitted to the old 
server that doesn't know about them. Again, these choices need to be specified 
up front, otherwise the implementation of v1 becomes the spec, since where the 
spec is not clear, the choices made by the implementation will become a de 
facto specification.

(BTW, especially with a v1, the implementation will invariably become a "de 
facto" specification, that's unavoidable. But it helps to have the spec clearly 
cover all areas, so that hopefully you don't need to reverse-engineer the 
implementation code to figure out how things work.)

Anyway, hope this is useful and clarifies some of the concerns I have about the 
current spec.


was (Author: vanzin):
HI [~pwendell],

Let me try to write a point-by-point feedback for the current spec.

h4. Public protocol or not?

If this is not supposed to be public (i.e., we don't expect someone to try to 
directly try to talk to the Spark master, it's always going to happen through 
the Spark libraries), then the underlying protocol is less important, since we 
only care about different versions being compatible in some way.

Assuming a non-public protocol, my question would be: why implement your own 
RPC framework? Why not reuse something that's already there? For example, Avro 
has a stable serialization infrastructure that defines semantics for versioned 
data, and works well on top of HTTP. If handles serialization and dispatching - 
which would remove a lot of code from the current patch, and probably has other 
features that the current, "cluster-mode" only protocol doesn't need but other 
future uses might.

h4. Non-submission uses

Similarly, in the non-public protocol scenario, a proper REST-based API would 
look like overkill. But a proper REST infrastructure provides interesting room 
for growth of the master's public-facing API. For example, you could easily 
expose an endpoint for listing the current applications being tracked by the 
master, or an endpoint to kill an application. The former could benefit, also, 
the history server, which could expose the same API to list the applications it 
has found.

h4. Evolvability and Versioning

The current spec does not specify the behavior of the cluster nor the client 
with regards to different versions of the protocol. It has a table that 
basically says "future versions need to be able to submit standalone cluster 
applications to a 1.3 master", but it doesn't explain what that means or how 
that happens.

Does it mean that, after 1.3, you can't ever change any of the messages used to 
launch a standalone cluster app, nor can you add new messages? Or, if that's 
allowed, what happens on the server side if it sees a field it doesn't 
understand? Does it ignore it, which could potentially break the application 
being submitted? Does it throw an error, in which case the client should make 
sure to submit an older version of the data structures if that's compatible 
with the app being submitted? If the latter, how does it know which version to 
use?

As an example of how you could do this "negotiation": the client checks what 
features the app being submitted needs, and chooses the oldest supported api 
version based on that. It then can submit the request to, e.g., "/v2" and, if 
submitting to a 1.3 cluster, it will fail, because it doesn't support the 
features needed by that app.

Also, thinking about the framework, what if later you need different features 
than the ones provided now? What if you need to use query params, path params, 
or non-json request bodies (e.g. for uploading files)? Are you gonna extend the 
current framework to the point where it starts looking like other existing ones?

Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for 
when something goes wrong? Should clients only bother about a response if the 
status is "200 OK", or should they try to interpret the body of a "500 Internal 
Server Error" message or "401 Bad Request"? Those things need to be specified.

h4. Others

If the suggestions above don't sound particularly interesting for this use 
case, I'd strongly suggest, in the very least, removing any mention of REST 
from the spec and the code, because this is not a REST protocol in any way.

Also, a question: if it's an HTTP protocol, why not expose it through the 
existing http port?


To reply to the questions about my suggestions for how to use REST:

* when you add a new version, you don't remove old ones. Spark v1.4 could add 
"/v2", but it must still support "/v1" in the way that it was specified.
* as for new fields / types, that really depends on how you specify things. 
Personally, I like to declare a released API "frozen": you can't add new types, 
fields, or anything that the old release doesn't know about. Any new thing 
requires a new protocol version. But you could take a different approach, by 
adding optional fields that don't cause breakages when submitted to the old 
server that doesn't know about them. Again, these choices need to be specified 
up front, otherwise the implementation of v1 becomes the spec, since where the 
spec is not clear, the choices made by the implementation will become a de 
facto specification.

(BTW, especially with a v1, the implementation will invariably become a "de 
facto" specification, that's unavoidable. But it helps to help the spec clearly 
cover all areas, so that hopefully you don't need to reverse-engineer the 
implementation code to figure out how things work.)

Anyway, hope this is useulf and clarifies some of the concerns I have about the 
current spec.

> Provide a stable application submission gateway in standalone cluster mode
> --------------------------------------------------------------------------
>
>                 Key: SPARK-5388
>                 URL: https://issues.apache.org/jira/browse/SPARK-5388
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>         Attachments: Stable Spark Standalone Submission.pdf
>
>
> The existing submission gateway in standalone mode is not compatible across 
> Spark versions. If you have a newer version of Spark submitting to an older 
> version of the standalone Master, it is currently not guaranteed to work. The 
> goal is to provide a stable REST interface to replace this channel.
> The first cut implementation will target standalone cluster mode because 
> there are very few messages exchanged. The design, however, should be general 
> enough to potentially support this for other cluster managers too. Note that 
> this is not necessarily required in YARN because we already use YARN's stable 
> interface to submit applications there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to