[
https://issues.apache.org/jira/browse/LENS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajat Khandelwal updated LENS-1169:
-----------------------------------
Attachment: LENS-1169.02.patch
> Stopping Query Service is incorrect
> -----------------------------------
>
> Key: LENS-1169
> URL: https://issues.apache.org/jira/browse/LENS-1169
> Project: Apache Lens
> Issue Type: Bug
> Reporter: Rajat Khandelwal
> Assignee: Rajat Khandelwal
> Attachments: LENS-1169.01.patch, LENS-1169.02.patch
>
>
> Stopping lens server basically stops all services. For query service, the
> current flow is this:
> * Preapre stopping:
> ** Interrupt All threads (query submitter, purger, status poller etc)
> * Persist state
> * Stop
> ** join all threads ( as mentioned above)
> Each of the threads is basically running in a large loop like the following:
> {noformat}
> while (!stopped && !this.isInterrupted()) {
> try {
> } catch(InterruptedException) {
> return
> }
> }
> {noformat}
> Now, interrupting a thread will cause InterruptException in the thread only
> when the thread is waiting/sleeping.
> So, the thread can exit in two ways:
> * By receiving interrupt
> * If an interrupt isn't received, it'll complete the current iteration loop
> and then exit.
> So there can be a scenario like the following (I faced such a scenario while
> working on LENS-904):
> * Stop is called from outside
> * Prepare stopping. Let's say QuerySubmitter didn't receive the interrupt and
> will exit after completing its current iteration.
> * Persist:
> ** Persist part1: Persisting driver states. e.g. HiveDriver keeps a map of
> query handle to hive operation handle.
> * QuerySubmitter submits the query to hive, changes the state of query to
> LAUNCHED and exits.
> * Persist:
> ** Persist part 2: Persisting queries. This persists the query mentioned in
> the above point as LAUNCHED.
> Now, on start, the states will be read back, query's state will be LAUNCHED,
> and HiveDriver won't have the operation handle corresponding to this query.
> This will cause the query to fail in next status update.
> Proposed Solution:
> Interrupt and join the threads before persisting in the prepareStopping
> phase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)