[ 
https://issues.apache.org/jira/browse/LENS-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624655#comment-14624655
 ] 

Bala Nathan commented on LENS-658:
----------------------------------

[~amareshwari] : Thanks for the suggestions. I agree that rebuilding session 
from the datastore will be based on the latest session state and hence a trail 
of the actual session will need to be maintained in the datastore itself. 

As for session cache, i'm thinking if we must maintain a session state cache on 
the servers. There are two obvious issues with not maintaining a session cache 
locally:

1) We will need to hit the datastore before every call to validate if the 
session is valid and if its state is the latest. This is still ok if we use a 
highly scalable read datastore 

2) The other caveat with not maintaining a local session cache and hitting the 
datastore is that it becomes a single point of failure and if the datastore is 
down for some reason, query operations may stall. 

If we do keep a session cache and maintain it's lifecycle asynchronously, 
rebuilding a session with latest state will be a challenge (can go out of sync 
between calls). I'm inclined to go ahead without a local session cache as the 
tradeoff of having one outweighs the benefits here. We can probably keep the 
session state in memory after it's fetched from datastore and use it as 
fallback if the datastore is not reachable. There may be an inconsistency, but 
atleast we dont have a very hard dependency on the availability of datastore or 
the challenges of keeping an async thread. 

Regarding query status , I agree with the suggestions. They are inline to what 
[~sriksun] had also suggested. 

> Distributed Lens Server Deployment
> ----------------------------------
>
>                 Key: LENS-658
>                 URL: https://issues.apache.org/jira/browse/LENS-658
>             Project: Apache Lens
>          Issue Type: New Feature
>          Components: server
>            Reporter: Bala Nathan
>
> Currently lens can be deployed and function only on a single node. This JIRA 
> tracks the approach to make lens work in a clustered environment. The 
> following key aspects of clustered deployment are discussed below:
> 1) Session Management
> Creating and Persisting a Session:
> A lens session needs to be created before any queries can be submitted to the 
> Lens server. A Lens session is associated with a unique session handle, 
> current database being used, any jar's been added which must be passed when 
> making queries, or doing metadata operations in the same session. The session 
> service is started as part of the lens server. Today the session object is 
> persisted on a local filesystem periodically (set via 
> lens.server.persist.location conf) for the purposes of recovery i.e. the Lens 
> Server will read from the location when it is restarted and recovery is 
> enabled. In a multi node deployment, this destination needs to be available 
> to all nodes. Hence the session information can be persisted on a datastore 
> instead of a filesystem that can allow reads from this location so that any 
> node in the cluster can re-construct the session and be able to execute 
> queries. 
> Restoring a Session:
> Lets consider the following scenario:
> 1) Lets assume there are three nodes in the lens cluster (L1, L2, L3). These 
> nodes are behind a load balancer (LB) and the load balance balance policy is 
> round robin.
> 2) When a User sends a create session to LB, LB will send the request to L1. 
> L1 will create a session, persist it in the datastore and return the session 
> handle back to caller.
> 3) User sends a submit query to LB, LB will send the request to L2. L2 will 
> check if the session is valid before it can submit the query. However, since 
> L2 does not have session info, it may not be able to process the query. Hence 
> L2 must rebuild the session information on itself from the datastore and then 
> submit a query. Based on the load balance policy, every node in the cluster 
> will eventually build all session information.
> 2) Query Lifecycle Management
> Once a query is issued to the lens server, it moves through various states:
>                                                                       
> Queued -> Launched -> Running -> Failed / Successful -> Completed 
> A client can check the status of a query based on a query handle. Since this 
> request can be routed to any server in the cluster, the query status also 
> needs to be persisted in a datastore. 
> Updating Status of a Query in the case of node failure: When the status of a 
> query is being requested, any node in the lens cluster can get the request. 
> If the node which initially issued the query is down for some reason, another 
> node needs to take over to update the status of the query. This can be 
> achieved by either maintaining a heartbeat or health check of each node in 
> the cluster. For Example:
> 1) L1 submitted the query and updated the datastore
> 2) L1 server went down
> 3) Request for query comes to L2. L2 checks if L1 is up via a healthcheck 
> URL. If the response is 200 OK, it routes the request to L1 , else it updates 
> the datastore (changes from L1 to L2) and takes over
> In the case of a Lens cluster restart, there may be a high load on the hive 
> server if every server in a cluster attempts to check the status of all hive 
> query handles. Hence, this would require some sort of stickiness to be 
> maintained in the query handle such that upon a restart, each server may 
> attempt to restore only those queries which originated from itself. 
> 3) Result Set Management 
> Lens supports two consumption modes for a query: In-Memory and Persistent. A 
> user can ask for a query to persist results in a HDFS location and can 
> retrieve it from a different node. However, this does not apply for in-memory 
> result sets (particularly for JDBC drivers). In the case of a in-memory 
> resultsets, a request for retrieving the resultset must go to the same node 
> which executed the query. Hence, a query handle stickiness is unavoidable. 
> Lets take an example to illustrate this:
> 1) Again, ets assume there are three nodes in the lens cluster (L1, L2, L3). 
> These nodes are behind a load balancer (LB) and the load balance balance 
> policy is round robin.
> 2) User created a session and submitted a query with in-memory resultset to 
> LB. LB routed the information to L1 in the cluster. L1 submitted the query, 
> persisted the state in a datastore and returned the query handle to user.
> 3) User sent a request to LB asking for status of query. LB routes it to L2. 
> L2 checks the datastore and says the query is completed.
> 4) User asks for resultset of query to LB. LB now sends it to L3. However, L3 
> does not have the resultset. Hence it cannot serve the request. 
> However if we persist the node information along with the query handle in the 
> datastore (#2), when the user asks for a resultset, L3 can now redirect the 
> request to L2 and ask it to take over. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to