----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48886/#review140309 -----------------------------------------------------------
This is getting rather complicated. Need to think through a simpler approach to clean up this data. queryComplete does the following - ObjectCacheFactory.removeLlapQueryCache(savedQueryId); - QueryInfo object - Directories on local disk One possibility (which is similar but can remove the requirement of the Delayed cleanup - assuming this is mainly because of the directory deletion ?) I think it's possible to have the local-dirs created on a per fragment for external requests - since Shuffle is not involved (which is what requires all fragments for a query to be under the application dir). This cleanup could be delinked from queryComplete. Cleaning up the in-memory structures could be handled immediately after the socket is closed - obtain lock, try cleanup (block submissions or block on new submissions). llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 154) <https://reviews.apache.org/r/48886/#comment205688> Nothing ever put into this structure. llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 158) <https://reviews.apache.org/r/48886/#comment205689> There's a window between moving the state to ACTIVE and taking the lock in queryCleanup where there's a race between queryComplete and registerFragment obtaining the lock - and will lead to the exception. llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 218) <https://reviews.apache.org/r/48886/#comment205686> There's a good chance that this call does nothing - since fragmentComplete would already have been invoked by this point. The Closeable generated earlier holds a reference to QueryFragmentInfo which has otherwise been removed. Think the Closeable generated will get cleaned up when the Socket closes. Don't think we'll end up accumulating QueryFragmentInfo objects, correct? llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 289) <https://reviews.apache.org/r/48886/#comment205687> Would prefer avoiding IO within a lock. Now this lock is for a single query only, and will be held at a time when new fragments are unlikely to show up - it is still possible for fragments to show up though. These fragments would go into a tight loop in this case - waiting to get this lock since it's been invalidated by this point. llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 352) <https://reviews.apache.org/r/48886/#comment205696> Why the null check ? This can go into a tight loop. llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java (line 565) <https://reviews.apache.org/r/48886/#comment205690> This belongs inside QueryInfo rather than at the QueryTracker. Unrelated: Need to move all of these to an interface so that there's good sepeartion between methods intended for use by other parts of the system and internal methods. Will create a jira and assign to myself. ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 211) <https://reviews.apache.org/r/48886/#comment205685> When is the Writer created ? - once the fragment starts executing, or when a read request is received from the client. What I understand from reading the code here is that it's created when a read request is received. If that's the case - there's no guarantees that we'll actually end up cleaning the structures that were created for this query. ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 289) <https://reviews.apache.org/r/48886/#comment205691> Can this be added into the FragmentCompletionHandler interface ? ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 290) <https://reviews.apache.org/r/48886/#comment205692> Does it make sense to return a FragmentCompletionHandler rather than a Closeable ? ql/src/test/org/apache/hadoop/hive/llap/TestLlapOutputFormat.java (line 52) <https://reviews.apache.org/r/48886/#comment205693> Needs tests. ObjectCacheFactory.removeLlapQueryCache(savedQueryId); - Siddharth Seth On June 17, 2016, 10:31 p.m., Jason Dere wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48886/ > ----------------------------------------------------------- > > (Updated June 17, 2016, 10:31 p.m.) > > > Review request for hive and Siddharth Seth. > > > Bugs: HIVE-14052 > https://issues.apache.org/jira/browse/HIVE-14052 > > > Repository: hive-git > > > Description > ------- > > Add a hook to call run QueryTracker.queryComplete if there are no more > fragments for this query. > This cleanup runs on delay and can be cancelled if another fragment request > comes in with the same query ID. > > > Diffs > ----- > > > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java > ded84c1 > > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java > c7e9d32 > > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java > a965872 > ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java > 825488f > ql/src/test/org/apache/hadoop/hive/llap/TestLlapOutputFormat.java 2288cd4 > > Diff: https://reviews.apache.org/r/48886/diff/ > > > Testing > ------- > > > Thanks, > > Jason Dere > >