paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones 
vectors, but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404387515
 
 
   @vrozov, as usual, you zeroed in on two issues.
   
   First, it turns out that we have both a `clone()` and `cloneEmpty()`. I 
should have used `cloneEmpty()`. This is a problem with not being able to run 
the pre-commit tests, I can't catch such bone-headed errors...
   
   Second, the `clone()` method does not even make sense. It will clear out 
child fields for the `MaterializedField`, but *not* the subtypes for the major 
type. This is very likely a bug. Why have we never caught it? Turns out that 
`clone()` is not called.
   
   The same bug exists for all of the `withPathAndType()` variations: we 
preserve subtypes. But, the semantics of the (mostly broken) union vectors 
appear to want a new vector to be given a materialized field without subtypes; 
subtypes are added later as the types are needed. Is that behavior itself a bug 
or a feature? We can't be sure because union vectors are not supported my much 
of Drill.
   
   This makes this a hard case. Since the behavior is murky, I'd suggest 
leaving well enough alone, and just fixing the one problem we know about: 
passing a fully-populated materialized field to a new vector.
   
   Might be worthwhile filing a JIRA to track this issue. (Though, we never 
seem to fix such tickets...)
   
   Running tests with the other PR (with the above fix) should tell us if the 
fix works, since that PR found the issue.
   
   To your fundamental question, please see the discussion in the original PR 
and in the description for details. The story is, in essence:
   
   * There is a fully-built vector coming into the partitioner.
   * When that incoming vector was built, the code for a map added children to 
the `MaterializedField`, and for unions, added a subtype to the `MajorType`.
   * For whatever reason, the partition wants to make a new copy of that vector 
using the same structure.
   * If we reuse the same vector, then the new code will modify it as we build 
up the maps or unions, causing concurrent modifications to the source vector's 
field while building the destination.
   * If we do a simple copy, we'll carry over child fields and subtypes, which 
will likely cause problems as we build up the new vector's structure.
   
   The original code would only work for non-nullable, fixed-width types (with 
no internal structure) I have no idea why it started to fail with my result set 
loader changes. Probably because, to get things to work, I had to fix bugs in 
the vector code in which we were not properly maintaining the material field 
structure, which caused problems with the new code tried to copy a vector 
structure to create the "overflow vector."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to