imply-cheddar opened a new pull request, #13773:
URL: https://github.com/apache/druid/pull/13773

   This commit borrows some test definitions from Drill's test suite and tries 
to use them to flesh out the full validation of window function capbilities.
   
   In order to be able to run these tests, we also add the ability to run a 
Scan operation against segments, which also meant an implementation of 
RowsAndColumns for frames.
   
   Initially, in trying to add these tests, I also started trying to
   fix the problems that arose.  One of which was being able to scan
   data from segments for use in queries.  This is necessary for these
   tests because the Drill tests are generally not grouping on things
   first and, instead, are essentially just resolving to scan operators.
   
   After resolving that issue, I ran into another set of bugs specifically
   associated with Calcite query planning, where Calcite did not give me
   a logical plan that mapped correctly to the semantics of the query.
   As I dove into that, I realized that it was a bigger ball of yarn and
   this commit was already starting to sprawl out in scope, so I changed
   strategy and am instead introducing the test framework and fixes
   that have been made so far, will introduce the full set of 2000
   files for the tests in a subsequent commit, and just focus this commit
   on the code changes required to get everything in place.
   
   Table of Contents (or, what to expect when reviewing this):
   
   1. Changes in `parquet-extension`, these are some code changes to add a main 
(`ParquetToJson`) that can be used to convert parquet files to new-line 
delimited Json.  This is just to have a utility for developers to use if we 
ever need to add a new dataset that is defined by parquet and is not a Main 
intended for a general audience
   2. There is a change to the `FrameColumnReader` to have it be able to read 
out a RowsAndColumns column.  This hopefully also provides a relatively 
straight-forward path for using Frame columns in cases where direct reads from 
locations makes more sense than the `ColumnSelector`/`DimensionSelector` routes 
that have been previously employed
   3. There's a `DecoratableRowsAndColumn` semantic interface added that takes 
on "decorations" of a RAC and tries to lazily execute them.  This is leveraged 
in making the ability to read the segment work.  Note that the capabilities for 
reading segments have only the minimum implemented to make these tests run and 
are not implemented and wired up to be able to actually execute in a 
distributed environment.
   4. There are only a few test cases checked in inside of this PR because, 
when I initially tried to include all of them, the PR was 2600 files.  I didn't 
want the 2600 files to take away from the code that is deserving of review, so 
I decided to only check in a few of the files in this PR.  Once this is merged, 
I will do a new PR with all of the other files, which will essentially just be 
creating new files in the `resources` directory without any actual code 
changes.  That should make it easy to merge in the 2600 extra files for tests.
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to