[
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507697#comment-16507697
]
ASF GitHub Bot commented on DRILL-6373:
---------------------------------------
paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396126712
Thanks much @vrozov for the analysis. I must say I'm a bit stumped. Value
vectors are clearly not designed for concurrent modification. That is not
simply a code bug, it is a fundamental design decision. Somewhere in code or
documentation I recall a statement that says that value vectors are meant to be
created once (by a single thread), then be immutable thereafter.
It should be perfectly fine for any number of readers, in separate threads,
to access the vector once it has entered its immutable phase. But, nothing
about vectors allows concurrent access while mutable.
What is going on in this use case to cause concurrent modification. Is that
a "bug" or a "feature"? In the stack trace you provided, both threads are
creating a new vector, which should not cause a conflict. If, however, they are
modifying the same record batch, then we are violating a design assumption
that, like vectors, batches are immutable once created, and that each batch is
mutated by a single thread.
The one other possibility is that a bit of code has a bug that is modifying
the immutable schema when it should be modifying the mutable one (if working
with two vectors), but I'm not sure how that could happen since code that adds
fields is not aware of other vectors. Also, AFAIK, while I did change some code
to keep metadata in sync (the design of `MaterializedField` really works only
for simple vectors; it is a muddle for complex vectors such as maps), the
changes only apply to the mutable stage of a vector's lifecycle.
Thoughts?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Refactor the Result Set Loader to prepare for Union, List support
> -----------------------------------------------------------------
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the
> {{ResultSetLoader}} and related classes to prepare for Union and List
> support. This fix follows the refactoring of the column accessors for the
> same purpose. Actual Union and List support is to follow in a separate PR.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)