paul-rogers commented on PR #13793: URL: https://github.com/apache/druid/pull/13793#issuecomment-1435100849
This discussion has expanded to cover two topics: how we handle aggregations in general on MSQ, and the original, specific topic for this PR. Issue #13816 covers the broader topic. The issue here seems to be a semantic issue. MSQ requires that every expression refer to an input column. However, `LATEST(foo)` has two reference: one to an _input column_ (`foo`) and another implicit reference to the _output column_ `__time`. MSQ will have to special-case this code. Someone has to determine where the reference needs to be modified. At native query generation time in the planner? As part of the controller task? The workaround is for the planner to simply forbid the one-argument form of these functions in MSQ, forcing the user to provide an input column to use for the basis. However, if we do that, then, as noted above, that input column _is not_ available at compaction time, so we would only solve the "first pass" ingestion (MSQ) but fail the "second pass" (compaction). Would be great for someone to do the analysis, the post a description of the problem, and propose a solution that works for both ingestion and compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
