ramitg254 commented on code in PR #6376:
URL: https://github.com/apache/hive/pull/6376#discussion_r2975909593
##########
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java:
##########
@@ -549,6 +578,39 @@ public boolean setEntryValid(CacheEntry cacheEntry,
FetchWork fetchWork) {
return false;
}
+ if (isSafeCacheWriteEnabled) {
Review Comment:
@abstractdog I have written down my understanding of the problem below
please validate:
Problem statement -> so when cache is enabled cache entry validation takes
place at post execution causing cache entry already been written to cache
directory in runtime exceeding max cache size which should not happen.
Replication of Problem Statement scenario -> I have added unit test which
replicate this problem statemnt in `TestCachedResults.java` in which for the
same set of queries `testUnsafeCacheWrite` passes when at runtime cache
directory size at runtime increases beyond max cache size allowed which is the
current behaviour we have and the other is `testSafeCacheWrite` which is
solving this issue for the same set of queries and not exceeding at any moment
on runtime beyond max cache size allowed.
Solution: we are introducing safe cache write conf which is when enabled
then query files for fetch work does not directly written to cache directory
but we proceed it as normal query execution and since normal query execution
also stores these files somewhere during runtime (like in local scratch dir in
case of these unit tests) and by this way we are not maintaining any extra
storage temporary buffer and if it fails then fails as normal query and if it
succeeds then we perform validation checks for those files in normal query
execution and if it is valid then we just copy those files to cache dir in post
execution and if while moving these files if it fails in between then I am
performing cleaanup as well for it.
And I made this configurable as there is overhead in copying files from
location of normal query execution to cache directory
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]