imply-cheddar opened a new pull request, #13773: URL: https://github.com/apache/druid/pull/13773
This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames. Initially, in trying to add these tests, I also started trying to fix the problems that arose. One of which was being able to scan data from segments for use in queries. This is necessary for these tests because the Drill tests are generally not grouping on things first and, instead, are essentially just resolving to scan operators. After resolving that issue, I ran into another set of bugs specifically associated with Calcite query planning, where Calcite did not give me a logical plan that mapped correctly to the semantics of the query. As I dove into that, I realized that it was a bigger ball of yarn and this commit was already starting to sprawl out in scope, so I changed strategy and am instead introducing the test framework and fixes that have been made so far, will introduce the full set of 2000 files for the tests in a subsequent commit, and just focus this commit on the code changes required to get everything in place. Table of Contents (or, what to expect when reviewing this): 1. Changes in `parquet-extension`, these are some code changes to add a main (`ParquetToJson`) that can be used to convert parquet files to new-line delimited Json. This is just to have a utility for developers to use if we ever need to add a new dataset that is defined by parquet and is not a Main intended for a general audience 2. There is a change to the `FrameColumnReader` to have it be able to read out a RowsAndColumns column. This hopefully also provides a relatively straight-forward path for using Frame columns in cases where direct reads from locations makes more sense than the `ColumnSelector`/`DimensionSelector` routes that have been previously employed 3. There's a `DecoratableRowsAndColumn` semantic interface added that takes on "decorations" of a RAC and tries to lazily execute them. This is leveraged in making the ability to read the segment work. Note that the capabilities for reading segments have only the minimum implemented to make these tests run and are not implemented and wired up to be able to actually execute in a distributed environment. 4. There are only a few test cases checked in inside of this PR because, when I initially tried to include all of them, the PR was 2600 files. I didn't want the 2600 files to take away from the code that is deserving of review, so I decided to only check in a few of the files in this PR. Once this is merged, I will do a new PR with all of the other files, which will essentially just be creating new files in the `resources` directory without any actual code changes. That should make it easy to merge in the 2600 extra files for tests. This PR has: - [x] been self-reviewed. - [x] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
