paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones 
vectors, but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404987283
 
 
   @sohami, thanks for your comments and questions. Unfortunately, I cannot 
debug the use case and so you may have a deeper understanding than I do. I'm 
working from experience gained some six months ago when working with the result 
set loader, and that knowledge is getting rusty.
   
   > In original PR there is a change for NullableValueVectors to add the 
values and bits vector materialized field as child field of parent vector 
field. ... From your comment it looks like because the internal values 
ValueVector mode needs to be required so you are creating another Materialized 
Field with that mode for internal values vector and adding it as child of 
parent vector field.
   
   The reason for that change is that the result set loader code that clones a 
vector needs to know the actual type. That code walks the vector tree, using 
the `MaterializedField` to get the type. If a `values` vector (which has no 
`bits` vector) reports its type as `Nullable`, then the clone will create a 
`bits` vector, which causes havoc.
   
   I'm thinking that I should change the cloning code. Rather than believing 
the `MaterializedField`, I can use the vector class type itself. That will be 
more clunky and slow, but it will eliminate the need to change the existing 
vector code.
   
   Given how long this discussion has gone on, that I can't do the required 
tests, and that we can't we discuss this in person, I'm thinking that the 
alternative approach may be more expedient.
   
   I suppose a larger question is whether the final bits of the result set 
loader are even still useful. Much work has been done on batch sizing since 
this work started. Is it still worth while finishing up this code so we can 
control the batch size for readers? Parquet has its own solution. Is it worth 
worrying about the others?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to