paul-rogers commented on pull request #2364: URL: https://github.com/apache/drill/pull/2364#issuecomment-1030499963
@vdiravka, good sleuthing! You did indeed find the hole in the system. Map (and repeated map) vectors are special: they are just holders for the actual data vectors. If they are reused, we get all the previous map members, which may or may not be a problem. I guess it would be a problem if reader 1 has a.b be an INT, while reader 2 wants a.b to be a VARCHAR. I guess a question is whether the HashAgg maintains a pointer to the map itself, or only the physical columns within it. It has no pointers to the map itself, we can special-case maps: they are considered the same schema if their contents are the same, whether or not the map vector itself is the same. Another choice would be to store the map vector in the cache, but strip all the physical columns out of it when the caller asks for it again. The caller then reassembles the physical columns, also from the cache, and hopefully creates the same map structure as the previous reader. Based on what you learned of the HashAgg, which of these might work? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
