paul-rogers opened a new pull request #1633: DRILL-7024: Refactor ColumnWriter 
to simplify type-conversion shim
URL: https://github.com/apache/drill/pull/1633
 
 
   DRILL-7006 added a type conversion "shim" within the row set framework. 
Basically, we insert a "shim" column writer that takes data in one form 
(String, say), and does reader-specific conversions to a target format (INT, 
say).
   
   The code works fine, but the shim class ends up needing to override a bunch 
of methods which it then passes along to the base writer. This PR refactors the 
code so that the conversion shim is simpler.
   
   The idea is simple: break each column writer into two parts: an "events" 
class that handles coordination of vector indexes, skipping nulls, vector 
overflow, harvesting and restarting batches, and so on. These operations always 
occur on the base writer: the one that manages actual value vectors.
   
   The other part is the column writer itself: the part that offers the various 
"set" methods for setting values. This is the interface that type conversion 
shims need to implement.
   
   So, how do we tie the two parts together? As it turns out, the code already 
had most of the solution: most of the "events" methods were already defined via 
a `WriterEvents` interface. Prior to this PR, every column writer (including 
shims) implemented that interface. So, we just need to have concrete writers 
implement the interface, but not conversion shims.
   
   Also, the existing code has an `ObjectWriter`: the class that tells us if a 
writer is a scalar, a map, a union, etc. This then becomes the logical place to 
connect the two aspects of a writer. When asking for a scalar writer, we return 
the writer aspect (which may be a shim.) This PR adds a new method to get the 
events aspect, which returns the concrete writer.
   
   Most of the work here is simply to adjust client code to be aware of this 
new protocol. Also, some classes were also renamed to make the new structure 
clearer.
   
   To make testing easier, all row-set related tests were marked with a new 
`RowSetTest` category. This allows running all related tests from the command 
line with a single command:
   
   ```
   mvn test -Dgroups=org.apache.drill.categories.RowSetTests
   ```
   
   In my branch, I get a test failure when I run all tests:
   
   ```
   TestPStoreProviders.verifyZkStore:67 ยป NoSuchElement
   ```
   
   However, since that test does not use the code touched here, I assume that 
the failure is unrelated to this change. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to