Hi Community, Consider a scenario where we have one update operation done on segment, so one index and data file are generated. Now one more update operation happens which will load the segments of old update to cache, and actual indexmerge of that segment to cache. But since we have horizontal compaction for update and delete, new index file generated after horizontal compaction, along with compacted files will also be loaded into cache for any next query on table. This is because, even though the files are invalid as they are compacted, their status is still as success in segment file.
So whenever the query comes on table, those will be loaded into cache even though they are invalid (Horizontally Compacted). This is wrong, and these persist inside cache until we drop cache. These will be loaded again if we do query. This will be avoided only when we run clean files on table. Clean files will clear the horizontally compacted files inside the segment, and update the segment file with valid ones. But by this time, if we have done the query before clean files, then even after deleting the horizontally compacted files, those are present in cache, it may lead to query failure. There can be two solutions, 1. Either maintain the status of horizontally compacted files inside the segment file, so as to avoid considering these files during query and clear cache after update operation for that query. 2. or, delete the horizontally compacted files after the horizontal compaction and clear the segment cache for that segment. With the proper solution, we can even avoid the operations we are doing based on the timestamps for IUD files in case of clean files. Better to refactor in a proper way. Any inputs or any improvement or suggestions are most welcome. Regards, Akash R Nilugal