GitHub user priteshm opened a pull request: https://github.com/apache/drill/pull/994
Merge from latest You can merge this pull request into a Git repository by running: $ git pull https://github.com/priteshm/drill master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/994.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #994 ---- commit c75dc4904d3ecb734f9369db6a9b4011956fb07c Author: Paul Rogers <prog...@maprtech.com> Date: 2017-03-10T23:56:18Z DRILL-5344: External sort priority queue copier fails with an empty batch Unit tests showed that the âpriority queue copierâ does not handle an empty batch. This has not been an issue because code elsewhere in the sort specifically works around this issue. This fix resolves the issue at the source to avoid the need for future work-arounds. closes #778 commit ee15632df3a869b3cc1063f882356d7eaab5b9f7 Author: Paul Rogers <prog...@maprtech.com> Date: 2017-03-14T23:18:24Z DRILL-5323: Test tools for row sets Provide test tools to create, populate and compare row sets To simplify tests, we need a TestRowSet concept that wraps a VectorContainer and provides easy ways to: - Define a schema for the row set. - Create a set of vectors that implement the schema. - Populate the row set with test data via code. - Add an SV2 to the row set. - Pass the row set to operator components (such as generated code blocks.) - Examine the contents of a row set - Compare the results of the operation with an expected result set. - Dispose of the underling direct memory when work is done. This code builds on that in DRILL-5324 to provide a complete row set API. See DRILL-5318 for the spec. Note: this code can be reviewed as-is, but cannot be committed until after DRILL-5324 is committed: this code has compile-time dependencies on that code. This PR will be rebased once DRILL-5324 is pulled into master. Handles maps and intervals The row set schema is refined to provide two forms of schema. A physical schema shows the nested structure of the data with maps expanding into their contents. Updates the row set schema builder to easily build a schema with maps. An access schema shows the row âflattenedâ to include just scalar (non-map) columns, with all columns at a single level, with dotted names identifying nested fields. This form makes for very simple access. Then, provides tools for reading and writing batches with maps by presenting the flattened view to the row reader and writer. HyperVectors have a very complex structure for maps. The hyper row set implementation takes a first crack at mapping that structure into the standardized row set format. Also provides a handy way to set an INTERVAL column from an int. There is no good mapping from an int to an interval, so an arbitrary convention is used. This convention is not generally useful, but is very handy for quickly generating test data. As before, this is a partial PR. The code here still depends on DRILL-5324 to provide the column accessors needed by the row reader and writer. All this code is getting rather complex, so this commit includes a unit test of the schema and row set code. Revisions to support arrays Arrays require a somewhat different API. Refactored to allow arrays to appear as a field type. While refactoring, moved interfaces to more logical locations. Added more comments. Rejiggered the row set schema to provide both a physical and flattened (access) schema, both driven from the original batch schema. Pushed some accessor and writer classes into the accessor layer. Added tests for arrays. Also added more comments where needed. Moved tests to DRILL-5318 The test classes previously here depend on the new âoperator fixtureâ. To provide a non-cyclic checkin order, moved the tests to the PR with the fixtures so that this PR is clear of dependencies. The tests were reviewed in the context of DRILL-5318. Also pulls in batch sizer support for map fields which are required by the tests. closes #785 commit 51ce7843dfd5d9979dcd205df4797869c12dc3a2 Author: Paul Rogers <prog...@maprtech.com> Date: 2017-03-14T23:18:24Z DRILL-5318: Sub-operator test fixture This commit depends on: * DRILL-5323 This PR cannot be accepted (or built) until the above are pulled and this PR is rebased on top of them. The PR is issued now so that reviews can be done in parallel. Provides the following: * A new OperatorFixture to set up all the objects needed to test at the sub-operator level. This relies on the refactoring to create the required interfaces. * Pulls the config builder code out of the cluster fixture builder so that configs can be build for sub-operator tests. * Modifies the QueryBuilder test tool to run a query and get back one of the new row set objects to allow direct inspection of data returned from a query. * Modifies the cluster fixture to create a JDBC connection to the test cluster. (Use requires putting the Drill JDBC project on the test class path since exec does not depend on JDBC.) Created a common subclass for the cluster and operator fixtures to abstract out the allocator and config. Also provides temp directory support to the operator fixture. Merged with DRILL-5415 (Improve Fixture Builder to configure client properties) Moved row set tests here from DRILL-5323 so that DRILL-5323 is self contained. (The tests depend on the fixtures defined here.) Added comments where needed. Puts code back as it was prior to a code review comment. The code is redundant, but necessarily so due to code which is specific to several primitive types. closes #788 commit 54e9d3bf60bc65846b6ec809226ef82698b90a7a Author: Paul Rogers <prog...@maprtech.com> Date: 2017-03-26T02:51:43Z DRILL-5385: Vector serializer fails to read saved SV2 Unit testing revealed that the VectorAccessorSerializable class claims to serialize SV2s, but, in fact, does not. Actually, it writes them, but does not read them, resulting in corrupted data on read. Fortunately, no code appears to serialize sv2s at present. Still, it is a bug and needs to be fixed. First task is to add serialization code for the sv2. That revealed that the recently-added code to save DrillBufs using a shared buffer had a bug: it relied on the writer index to know how much data is in the buffer. Turns out sv2 buffers donât set this index. So, new versions of the write function takes a write length. Then, closer inspection of the read code revealed duplicated code. So, DrillBuf allocation moved into a version of the read function that now does reading and DrillBuf allocation. Turns out that value vectors, but not SV2s, can be built from a Drillbuf. Added a matching constructor to the SV2 class. Finally, cleaned up the code a bit to make it easier to follow. Also allowed test code to access the handy timer already present in the code. closes #800 commit dfd0abd6c19caf7d8024c189179c9662626b2f22 Author: Paul Rogers <prog...@maprtech.com> Date: 2017-04-09T03:52:04Z DRILL-5423: Refactor ScanBatch to allow unit testing record readers Refactors ScanBatch to allow unit testing of record reader implementations, especially the âwriterâ classes. See JIRA for details. closes #811 commit c1c2a89bfd0ab5cdc20abaa532a9476c4fa2e252 Author: Paul Rogers <prog...@maprtech.com> Date: 2017-04-11T21:42:57Z DRILL-5428: submit_plan fails after Drill 1.8 script revisions When the other scripts were updated, submit_plan was not corrected. After Drill 1.8, drill-config.sh consumes all command line arguments, finds the âconfig and âsite options, removes them, and places the rest in the new args array. This PR updates submit_plan to use the new args array. The fix was tested on a test cluster: we verified that a physical plan was submitted and ran. closes #816 commit df15c75f8d2e8bf4b0e1dc7396068bdb9b266c49 Author: Arina Ielchiieva <arina.yelchiy...@gmail.com> Date: 2017-04-26T13:27:19Z DRILL-5391: CTAS: make folder and file permission configurable close #820 commit 496a964d07e900f408529e8c252f9e5fbb4ae0e9 Author: liyun Liu <llys...@hotmail.com> Date: 2017-05-04T04:46:58Z DRILL-4039: Query fails when non-ascii characters are used in string literals closes #825 ---- ---