ramitg254 commented on code in PR #6376:
URL: https://github.com/apache/hive/pull/6376#discussion_r2975909593


##########
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java:
##########
@@ -549,6 +578,39 @@ public boolean setEntryValid(CacheEntry cacheEntry, 
FetchWork fetchWork) {
         return false;
       }
 
+      if (isSafeCacheWriteEnabled) {

Review Comment:
   @abstractdog I have written down my understanding of the problem below 
please validate:
   
   Problem statement -> so when cache is enabled cache entry validation takes 
place at post execution causing cache entry already been written to cache 
directory in runtime exceeding max cache size which should not happen.
   
   Replication of Problem Statement scenario -> I have added unit test which 
replicate this problem statemnt in `TestCachedResults.java‎` in which for the 
same set of queries `testUnsafeCacheWrite` passes when at runtime cache 
directory size at runtime increases beyond max cache size allowed which is the 
current behaviour we have and the other is `testSafeCacheWrite` which is 
solving this issue for the same set of queries and not exceeding at any moment 
on runtime beyond max cache size allowed.
   
   Solution: we are introducing safe cache write conf which is when enabled 
then query files for fetch work does not directly written to cache directory 
but we proceed it as normal query execution and since normal query execution 
also stores these files somewhere during runtime (like in local scratch dir in 
case of these unit tests) and by this way we are not maintaining any extra 
storage temporary buffer and if it fails then fails as normal query and if it 
succeeds then we perform validation checks for those files in normal query 
execution and if it is valid then we just copy those files to cache dir in post 
execution and if while moving these files if it fails in between then I am 
performing cleaanup as well for it.
   And I made this configurable as there is overhead in copying files from 
location of normal query execution to cache directory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to