jordepic opened a new pull request, #8125:
URL: https://github.com/apache/paimon/pull/8125

   ### Purpose
   
   When 'lookup.remote-file.enabled' is set, the writer persists per-data-file 
lookup ssts to object store during lookup compaction. Until now only the 
write/compaction path wired a RemoteFileDownloader onto LookupLevels, so the 
LocalTableQuery read path (lookup joins, query service) always rebuilt the 
lookup sst by re-scanning the data file from object store on a (SSTable) cache 
miss.
   
   Wire a RemoteLookupFileManager onto each bucket's LookupLevels on the read 
path so a cache miss downloads the already-persisted sst instead of rebuilding 
it. This is scoped to the only case where reusing those ssts is correct:
   
     1) lookup.remote-file.enabled is true (the ssts exist at all)
     2) deletion vectors are off, so the writer persisted "value"-processor ssts
        (full serialized value) rather than "position-based" ssts this
        value-based
        read path cannot interpret
     3) the query reads the full value, not a projection, since the remote sst
        encodes the full value row. While we could read the full row and then
        only return the correct fields to the user, we omit that for the time
        being.
   
   When any condition does not hold, no downloader is registered and the read 
path falls back to building the sst locally, exactly as before.
   
   ### Tests
   
   Test added to PrimaryKeySimpleTableTest - which is where other primary key 
tests have gone.
   
   In this test, we deliberately remove certain data files which have had their 
remote SSTables persisted so that we can prove that we're able to perform a 
lookup join just using those remote SSTables (we wouldn't be able to recreate 
them ourselves in the first place).
   
   Then, we remove the remote SSTables, add back our data files, and show that 
falling back to creating the SSTables ourselves still functions as expected.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to