[ 
https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541129#comment-16541129
 ] 

ASF GitHub Bot commented on DRILL-6585:
---------------------------------------

paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones 
vectors, but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404387515
 
 
   @vrozov, as usual, you zeroed in on two issues.
   
   First, it turns out that we have both a `clone()` and `cloneEmpty()`. I 
should have used `cloneEmpty()`. This is a problem with not being able to run 
the pre-commit tests, I can't catch such bone-headed errors...
   
   Second, the `clone()` method does not even make sense. It will clear out 
child fields for the `MaterializedField`, but *not* the subtypes for the major 
type. This is very likely a bug. Why have we never caught it? Turns out that 
`clone()` is not called.
   
   The same bug exists for all of the `withPathAndType()` variations: we 
preserve subtypes. But, the semantics of the (mostly broken) union vectors 
appear to want a new vector to be given a materialized field without subtypes; 
subtypes are added later as the types are needed. Is that behavior itself a bug 
or a feature? We can't be sure because union vectors are not supported my much 
of Drill.
   
   This makes this a hard case. Since the behavior is murky, I'd suggest 
leaving well enough alone, and just fixing the one problem we know about: 
passing a fully-populated materialized field to a new vector.
   
   Might be worthwhile filing a JIRA to track this issue. (Though, we never 
seem to fix such tickets...)
   
   Running tests with the other PR (with the above fix) should tell us if the 
fix works, since that PR found the issue.
   
   To your fundamental question, please see the discussion in the original PR 
and in the description for details. The story is, in essence:
   
   * There is a fully-built vector coming into the partitioner.
   * When that incoming vector was built, the code for a map added children to 
the `MaterializedField`, and for unions, added a subtype to the `MajorType`.
   * For whatever reason, the partition wants to make a new copy of that vector 
using the same structure.
   * If we reuse the same vector, then the new code will modify it as we build 
up the maps or unions, causing concurrent modifications to the source vector's 
field while building the destination.
   * If we do a simple copy, we'll carry over child fields and subtypes, which 
will likely cause problems as we build up the new vector's structure.
   
   The original code would only work for non-nullable, fixed-width types (with 
no internal structure) I have no idea why it started to fail with my result set 
loader changes. Probably because, to get things to work, I had to fix bugs in 
the vector code in which we were not properly maintaining the material field 
structure, which caused problems with the new code tried to copy a vector 
structure to create the "overflow vector."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PartitionSender clones vectors, but shares field metdata
> --------------------------------------------------------
>
>                 Key: DRILL-6585
>                 URL: https://issues.apache.org/jira/browse/DRILL-6585
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> See the discussion forĀ [PR #1244 for 
> DRILL-6373|https://github.com/apache/drill/pull/1244].
> The PartitionSender clones vectors. But, it does so by reusing the 
> {{MaterializedField}} from the original vector. Though the original authors 
> of {{MaterializedField}} apparently meant it to be immutable, later changes 
> for maps and unions ended up changing it to add members.
> When cloning a map, we get the original map materialized field, then start 
> doctoring it up as we add the cloned map members. This screws up the original 
> map vector's metadata.
> The solution is to clone an empty version of the materialized field when 
> creating a new vector.
> But, since much code creates vectors by giving a perfectly valid, unique 
> materialized field, we want to add a new method for use by the ill-behaved 
> uses, such as PartitionSender, that ask to create a new vector without 
> cloning the materialized field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to