pmmag opened a new issue, #18166: URL: https://github.com/apache/druid/issues/18166
### Description The docs make it seem as if the tsColumn is used as a filter in the lookup query, and that only the changed rows are updated in the lookup. But this in not the case. Instead the tsColumn is only used to check the need for updating the lookup (and then the entire table will be pulled and the entire lookup will be rewritten). See: https://github.com/apache/druid/issues/10735 (closed as stale in 2023, I don't have permissions to reopen it) ### Motivation As a user I am not sure if I should rely on the documented or the actual behavior (and I would expect them to match!). For example, with the current undocumented implementation, I could implement a poor man's solution for syncing deleted rows, where I simply use a trigger to update the tsColumn of some dummy row when a row is deleted (the current implementation does not notice deletes if no rows are updated to bump the tsColumn value). But if the implementation was later fixed to match the documentation, this solution would break, because if the tsColumn was in fact used as a filter and only the returned rows would be updated in the lookup, then deletes would never be synced. Also, as argued in the original issue, the documentation implies that the tsColumn can be used to limit the number of rows returned by each query (vs limiting the number of queries made as will actually be the case). While both of these may help to reduce load on the DB, they do so in quite different ways, which may matter to some users. And actually I'd argue that both of these strategies could ideally be combined if the intention was to reduce load. Btw, neither solution is very good at handling deletes anyway (except for the somewhat hacky solution I gave as an example above). It would be nice if there was an API endpoint for simply forcing a refresh. I guess this can be done now by re-posting the current configuration without changes, but the fact that this works (or should work) is also not very obvious based on the docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
