[
https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541129#comment-16541129
]
ASF GitHub Bot commented on DRILL-6585:
---------------------------------------
paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones
vectors, but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404387515
@vrozov, as usual, you zeroed in on two issues.
First, it turns out that we have both a `clone()` and `cloneEmpty()`. I
should have used `cloneEmpty()`. This is a problem with not being able to run
the pre-commit tests, I can't catch such bone-headed errors...
Second, the `clone()` method does not even make sense. It will clear out
child fields for the `MaterializedField`, but *not* the subtypes for the major
type. This is very likely a bug. Why have we never caught it? Turns out that
`clone()` is not called.
The same bug exists for all of the `withPathAndType()` variations: we
preserve subtypes. But, the semantics of the (mostly broken) union vectors
appear to want a new vector to be given a materialized field without subtypes;
subtypes are added later as the types are needed. Is that behavior itself a bug
or a feature? We can't be sure because union vectors are not supported my much
of Drill.
This makes this a hard case. Since the behavior is murky, I'd suggest
leaving well enough alone, and just fixing the one problem we know about:
passing a fully-populated materialized field to a new vector.
Might be worthwhile filing a JIRA to track this issue. (Though, we never
seem to fix such tickets...)
Running tests with the other PR (with the above fix) should tell us if the
fix works, since that PR found the issue.
To your fundamental question, please see the discussion in the original PR
and in the description for details. The story is, in essence:
* There is a fully-built vector coming into the partitioner.
* When that incoming vector was built, the code for a map added children to
the `MaterializedField`, and for unions, added a subtype to the `MajorType`.
* For whatever reason, the partition wants to make a new copy of that vector
using the same structure.
* If we reuse the same vector, then the new code will modify it as we build
up the maps or unions, causing concurrent modifications to the source vector's
field while building the destination.
* If we do a simple copy, we'll carry over child fields and subtypes, which
will likely cause problems as we build up the new vector's structure.
The original code would only work for non-nullable, fixed-width types (with
no internal structure) I have no idea why it started to fail with my result set
loader changes. Probably because, to get things to work, I had to fix bugs in
the vector code in which we were not properly maintaining the material field
structure, which caused problems with the new code tried to copy a vector
structure to create the "overflow vector."
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> PartitionSender clones vectors, but shares field metdata
> --------------------------------------------------------
>
> Key: DRILL-6585
> URL: https://issues.apache.org/jira/browse/DRILL-6585
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> See the discussion forĀ [PR #1244 for
> DRILL-6373|https://github.com/apache/drill/pull/1244].
> The PartitionSender clones vectors. But, it does so by reusing the
> {{MaterializedField}} from the original vector. Though the original authors
> of {{MaterializedField}} apparently meant it to be immutable, later changes
> for maps and unions ended up changing it to add members.
> When cloning a map, we get the original map materialized field, then start
> doctoring it up as we add the cloned map members. This screws up the original
> map vector's metadata.
> The solution is to clone an empty version of the materialized field when
> creating a new vector.
> But, since much code creates vectors by giving a perfectly valid, unique
> materialized field, we want to add a new method for use by the ill-behaved
> uses, such as PartitionSender, that ask to create a new vector without
> cloning the materialized field.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)