jordepic opened a new pull request, #8125:
URL: https://github.com/apache/paimon/pull/8125
### Purpose
When 'lookup.remote-file.enabled' is set, the writer persists per-data-file
lookup ssts to object store during lookup compaction. Until now only the
write/compaction path wired a RemoteFileDownloader onto LookupLevels, so the
LocalTableQuery read path (lookup joins, query service) always rebuilt the
lookup sst by re-scanning the data file from object store on a (SSTable) cache
miss.
Wire a RemoteLookupFileManager onto each bucket's LookupLevels on the read
path so a cache miss downloads the already-persisted sst instead of rebuilding
it. This is scoped to the only case where reusing those ssts is correct:
1) lookup.remote-file.enabled is true (the ssts exist at all)
2) deletion vectors are off, so the writer persisted "value"-processor ssts
(full serialized value) rather than "position-based" ssts this
value-based
read path cannot interpret
3) the query reads the full value, not a projection, since the remote sst
encodes the full value row. While we could read the full row and then
only return the correct fields to the user, we omit that for the time
being.
When any condition does not hold, no downloader is registered and the read
path falls back to building the sst locally, exactly as before.
### Tests
Test added to PrimaryKeySimpleTableTest - which is where other primary key
tests have gone.
In this test, we deliberately remove certain data files which have had their
remote SSTables persisted so that we can prove that we're able to perform a
lookup join just using those remote SSTables (we wouldn't be able to recreate
them ourselves in the first place).
Then, we remove the remote SSTables, add back our data files, and show that
falling back to creating the SSTables ourselves still functions as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]