[
https://issues.apache.org/jira/browse/LIVY-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyorgy Gal updated LIVY-782:
----------------------------
Fix Version/s: 0.10.0
(was: 0.9.0)
This issue has been moved to the 0.10.0 release as part of a bulk update. If
you feel this is moved out inappropriately, feel free to provide justification
and reset the Fix Version to 0.9.0.
> Idempotent Livy session creation
> --------------------------------
>
> Key: LIVY-782
> URL: https://issues.apache.org/jira/browse/LIVY-782
> Project: Livy
> Issue Type: New Feature
> Components: API, Server
> Reporter: Andrew Fogarty
> Priority: Major
> Fix For: 0.10.0
>
>
> h2. Problem description
> Livy currently has POST APIs for creating sessions:
> * To create a batch session, a client must submit a post request to
> “/batches”.
> * To create an interactive session, a client must submit a POST request to
> “/sessions”.
> Both APIs generate a unique session ID which is returned to the client as
> part of the response payload.
> These APIs are not idempotent. That is, if either the request or the
> response is lost in transit, the client has no way to validate whether that
> job has started. The only way to retry is to submit another POST, which
> could potentially start a second job.
> For example, suppose a client submits a POST to create a new batch session.
> Livy receives the request and starts the batch session with ID=12. When Livy
> sends the response, assume it is lost in transit due to some networking
> issue. The client never receives the response, so it does not know if the
> batch started correctly and does not have an ID to query the status of the
> batch session.
> This document contains two proposed solutions for this idempotence problem.
> These solutions introduce APIs for creating sessions in an idempotent manner.
> Neither solution makes changes to existing APIs.
> h2. Suggested solution
> This proposed solution introduces 1 new API:
> * PUT(“/\{session type}/”) -> Session
> This API is described below. *Note:* ‘->’ indicates the call “returns”.
> h3. API: PUT(“/\{session type}/”) -> Session – Create session with given
> request ID header
> This new API is a PUT to create a new session (batch or interactive) for the
> given session ID. This new API is very similar to the existing POST API to
> create a session and expects the request payload to be a CreateBatchRequest
> or CreateInteractiveRequest as appropriate.
> The difference between this PUT API and the existing session POST API is that
> requests to this API must contain a “requestId” header with a GUID value. If
> the requestId is not provided, then PUT will fail with an error. This
> requestId is saved as an optional field on the metadata object
> (BatchRecoveryMetadata or InteractiveRecoveryMetadata) stored in the
> SessionStore.
> When creating the session, before storing the metadata object in the
> SessionStore, we query the SessionStore to see if some session already exists
> with that requestId. If a session with the requestId already exists, then we
> return that session instead of creating a new one. If there is no existing
> session with that requestId, then we create the session normally.
> h3. Example
> This solution solves the idempotence problem by ensuring that repeat calls to
> PUT with the same requestId will return the first created session. If a
> client makes a request to PUT but for some reason does not receive a
> response, then they can retry that request with the same requestId. If the
> session had not started, then it will start. Otherwise, if the session has
> already started, then its session object will be returned to the client.
> h2. Alternative solution
> Introduce 2 new APIs:
> # POST(“/\{session type}/id”) -> \{sessionId: Int, : GUID}
> # PUT(“/\{session type}/\{session id}”) -> Session
> Both are described below.
> h3. API 1: POST(“/\{session type}/id”) -> \{sessionId: Int, putKey: GUID} –
> Generates a new unique sessionId
> The first API is a POST to generate a new unique session ID for the given
> session type (batch or interactive).
> This API would:
> # Increment the existing sessionId incrementor.
> # Store a new value \{“/putkey/\{session type}/\{session id}” -> putKey} in
> the SessionStore.
> # Return \{sessionId: Int, putKey: GUID} payload.
> The returned payload contains the session ID as well as the “putKey”, which
> is a GUID used in the second API to validate the sessionID. We call this the
> “putKey” because it represents a unique key used to identify the PUT request.
> We store the mapping from session ID to putKey in the SessionStore so that
> the second API can validate that a provided session ID matches its putKey.
> h3. API 2: PUT(“/\{session type}/\{session id}”) -> Session – Create a
> session with the given session ID
> The second API is a PUT to create a new session (batch or interactive) for
> the given session ID. This new API is very similar to the existing POST API
> to create a session and expects the request payload to be a
> CreateBatchRequest or CreateInteractiveRequest as appropriate.
> CreateBatchRequest and CreateInteractiveRequest will contain the optional
> putKey field.
> This API would:
> # Validate that the provided session ID matches the putKey by reading the
> \{“/putkey/\{session type}/\{session id}” value from the SessionStore.
> ## If no putKey is provided, or the session ID does not match the putKey,
> then we fail the request. This is to ensure that the provided sessionID was
> generated by the first API, and that some client isn’t using a sessionID that
> it should not have permission to use.
> # Follow the usual code path to create a session, except pass down the
> session ID and the putkey.
> ** For this feature, we would change that code path in BatchSession and
> InteractiveSession. Before saving the session metadata record to
> SessionStore, we check that some record with this ID does not already exist
> in the SessionStore. If it does, then we just return that session and do not
> create a new session.
> h3. Example
> With these new APIs, a client can get a valid session ID before submitting
> their batch or interactive session to Livy.
> The sequence would be:
> # Call POST(“/\{session type}/id”) to get a new valid session ID and putKey.
> # Call PUT(“/\{session type/{session id}”) to start a new session with that
> valid session ID.
> # If for some reason the client does not receive a response, use the ID to
> query Livy for the status. Otherwise, they can re-submit the PUT request.
> When a request is re-submitted:
> ## If the session had not started, it will start.
> ## If the session had started already, its session object will be returned
> to the client.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)