> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java
> > Line 73 (original), 73 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097990#file2097990line75>
> >
> >     if we expect multiple sessions to access this, should we make this 
> > `volatile`?
> 
> denys kuzmenko wrote:
>     it's being accesed only inside of the critical section (within the lock 
> boundaries)
> 
> Sahil Takiar wrote:
>     does java guarantee that non-volatile variables accessed inside a 
> critical section are not cached locally by a CPU?
> 
> denys kuzmenko wrote:
>     In short - yes.
>     
>     JSR 133 (Java Memory Model)
>     
>     Synchronization ensures that memory writes by a thread before or during a 
> synchronized block are made visible in a predictable manner to other threads 
> which synchronize on the same monitor. After we exit a synchronized block, we 
> release the monitor, which has the effect of flushing the cache to main 
> memory, so that writes made by this thread can be visible to other threads. 
> Before we can enter a synchronized block, we acquire the monitor, which has 
> the effect of invalidating the local processor cache so that variables will 
> be reloaded from main memory. We will then be able to see all of the writes 
> made visible by the previous release.

makes sense


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 125 (original), 120 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line125>
> >
> >     we might need to re-think how we are synchronizing this method a bit. I 
> > think we want to support the use case where we call `close()` while 
> > `open()` is being run. The offers a way for the user to cancel a session 
> > while it is being opened, which can be useful if opening a session takes a 
> > long time, which can happen on a busy cluster where there aren't enough 
> > resources to open a session.
> >     
> >     fixing that might be out of the scope of this JIRA, so I would 
> > recommend using a separate lock to guard against multiple users calling 
> > open on the same session.
> 
> Sahil Takiar wrote:
>     Tracking the aformentioned fix in HIVE-20519, unless you want to fix it 
> in this patch.
> 
> denys kuzmenko wrote:
>     i think it should be addressed in another JIRA, right now we need to have 
> working at least basic use-case
> 
> Sahil Takiar wrote:
>     okay, still recommend using a separate lock
> 
> denys kuzmenko wrote:
>     open() and close() both manipulate with the shared variable (isOpen), so 
> they have to be synchronized on a same monitor (at least in current approach).
>     I am not sure whether SparkContext supports instant interruption 
> (Thread.interrupt or sc.stop()). However when closing session that is in 
> progress, you have to take care of SparkContext.

yes, but we need to support calling `close()` while `open()` is being run, as I 
described in my first comment. the (2) bullet point in the RB description 
states that you want to guard against multiple callers invoking the `open()` 
method, so logically you should just have a single lock to handle this 
behavior. i suggest you use a separate lock for this scenario because the 
`closeLock` is meant to handle synchronization of the `close()` method, it was 
not meant to handle synchronization of the `open()` method by multiple callers. 
IMO by re-using the lock we make the code harder to understand because we are 
using the `closeLock` for functionality that should be outside its scope.

I don't feel strongly about this though given we will need to re-factor this 
code later anyway to support calling `close()` while `open()` is running, so 
I'll leave it up to you.


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 352 (original)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line361>
> >
> >     why remove this?
> 
> denys kuzmenko wrote:
>     it's not required. close() method is covered with the lock, and 
> activeJobs is a concurrent collection
> 
> Sahil Takiar wrote:
>     what happens if a job is submitted after `hasTimedOut` returns true?
> 
> denys kuzmenko wrote:
>     I see. However existing lock won't help as it doesn't prevent other 
> threads from adding new queries. 
>     
>     public void onQuerySubmission(String queryId) {
>         activeJobs.add(queryId);
>     }
>     
>     we might need to cover this with separate lock (onQueryCompletion, 
> onQuerySubmission, triggerTimeout)
>     What do you think?
> 
> denys kuzmenko wrote:
>     what happens if a job is submitted after hasTimedOut returns true? 
> current Session will be closed and a new one will be opened.
> 
> denys kuzmenko wrote:
>     there might be an issue when 2nd session checkes state and get isOpen and 
> before it's reaching the submit phase, 1st one calls the close.
>     I think we need synchronization for active sessions manipulation.
> 
> denys kuzmenko wrote:
>     fixed. reverted back to the original locking. Above tricky case will be 
> handled by preventing new queries to execute open() before close() is 
> complete.
> 
> denys kuzmenko wrote:
>     @before triggerTimeout() is complete

A little confused by all the comments here. Looks like the change has been 
reverted. Do we agree that the current code is sufficient, or are their other 
edge cases you think need to be covered?


- Sahil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69022/#review209617
-----------------------------------------------------------


On Oct. 16, 2018, 4:49 p.m., denys kuzmenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69022/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2018, 4:49 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Adam Szita.
> 
> 
> Bugs: HIVE-20737
>     https://issues.apache.org/jira/browse/HIVE-20737
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
> 2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> 72ff53e3bd 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
>  bb50129518 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestLocalHiveSparkClient.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69022/diff/3/
> 
> 
> Testing
> -------
> 
> Added TestLocalHiveSparkClient test
> 
> 
> File Attachments
> ----------------
> 
> HIVE-20737.7.patch
>   
> https://reviews.apache.org/media/uploaded/files/2018/10/15/9cf8a2b3-9ec1-4316-81d0-3cd124b1a9fd__HIVE-20737.7.patch
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>

Reply via email to