[GitHub] drill issue #1085: DRILL-6049: Misc. hygiene and code cleanup changes

paul-rogers Mon, 08 Jan 2018 15:27:06 -0800

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1085
  
    ### Error Handling
    
    The column writer work introduces a new scan operator mechanism. Part of 
that mechanism is to standardize error handling. Part of that work is, in turn, 
to add a a couple new error types to UserError to better categorize errors.
    
    ### Star Column
    
    Organized our many references to the "star column" to make it more readily 
available to the new scan framework.
    
    ### Date/Time Utilities
    
    The various date/time vectors have very handy methods to convert the vector 
representation into Joda objects. This PR pulls that code out into a utility 
class that can be used in other places. This change uncovered a flaw in JDBC: 
that the existing date/time utilities were not included in the JDBC jar, 
causing failures. This PR fixes that.
    
    ### Deprecate OptionSet
    
    In a previous set of PRs we've been assembling a sub-operator test 
framework. This PR continues that work by deprecating the prior OptionSet 
class. (OptionSet was meant to work around limitations in the original 
OptionManager, but those issues have since been resolved by all the great work 
that Jyothsna and Tim have done. Since OptionSet now no longer serves a useful 
purpose, this PR removes it.)
    
    ### Deprecate StatsWriter
    
    Similarly, the StatsWriter class added a stats mechanism that can be used 
in sub-operator tests. Further experience has shown we can make the full 
OperatorStats work in this context, so StatsWriter is deprecated in this PR.
    
    ### Fix for Close of Sort Operator
    
    A bit of code was reshuffled to close the operator context in the correct 
order in the sort operator in both production and unit tests.
    
    ### Swap Containers
    
    Added code to swap buffers between two containers. Builds on code added 
earlier to exchange buffers between two vectors. (Used in scan operator, to be 
added later.)
    
    ### CSV Column Header Error Handling
    
    Changed CSV to use UserException to report errors instead of an ad-hoc 
HeaderError exception.
    
    ### Test Tools
    
    Add a number of services to the ClusterFixture family of test classes. The 
key new feature is the ability to read the results of a multi-batch query as a 
sequence of RowSet objects so we can use the row set tools to verify results.
    
    ### Union Vector
    
    The union vector is a rather odd duck: it works, but is not widely 
supported in Drill. It stores its "type" members in two ways: in an internal 
map and in member variables. The member variable format makes it hard to treat 
the types generically. This PR keeps the serialized format (storing vectors in 
an internal map) but changes the member variable format to use an array of type 
vectors rather than individual variables. The result is that generic access 
(for the new Union column writer) is much easier.
    
    ### Materialized Field in Vectors
    
    Vectors provide a MaterializedField to hold their metadata. The original 
idea seems to be that a MaterializedField is immutable. This worked great for 
simple scalar types. But, code evolved to change the MaterializedField for maps 
(to add map members), for Unions (to add union members) and so on.
    
    For container types (Map, Repeated Map, Union, List, Repeated List), the 
"parent" MaterializedField holds onto a copy of the child field. But, some 
parents, when they receive new members, throw away their old MaterializedField 
and create a new one. The result is that, if we have the structure map(map(a, 
b)), when we add b to the inner map, the parent ends up pointing to an older 
version of the child schema, not the updated one.
    
    This PR fixes at least some of those issues in the Union vector.
    
    ### JUnit Upgrade
    
    Upgrades JUnit from 4.11 to 4.12 to make use of a handy annotation that 
disables timeouts when debugging.
    
    ### Generic Cleanup
    
    Added comments, fixed indentation and other minor changes.

---

[GitHub] drill issue #1085: DRILL-6049: Misc. hygiene and code cleanup changes

Reply via email to