caiconghui commented on pull request #8680:
URL: https://github.com/apache/incubator-doris/pull/8680#issuecomment-1081889227
> I think this pr should be discussed sufficiently before we merge it in.
>
> Unlike the scalar type, the type info of dynamic types (such as Array, Map
and etc) can not be generated in compile time. For instance, in
`get_type_info(const TabletColumn* col)`, the depth of the col should be
calculated first and it is unknown until runtime. The optimization for
`ArrayType` here is powered by small object optimization technique (cache
objects with small depth) but it sacrifice the flexibility for objects with
bigger depth.
>
> I think there are some possible improvements (may hard to implement):
>
> 1. It is reasonable to restrict the max depth to protect our system from
suffering serious performance penalty. I was also planning to do this. However,
if we can configure the max depth in runtime, it may be better than the hard
coded number.
> 2. `ArrayTypeInfo` can be refactored to store both leaf type info and
depth, therefore, we can construct an `ArrayTypeInfo` with any depth in a quick
way. After that, we may cache the objects with small depth statically and use
some cache strategies for objects with large depth to speed up.
>
> All in all, the key problem is how to manage the resource of dynamic types
if there are generated dynamically. I think we should also consider whether the
refactor introduced by this pr is suitable for the type info of `MayType`
(which may be more hard to manage than `ArrayType`).
>
> For now, `ArrayType` feature is under development and we have not start
the performance stuffs for `ArrayType`. Thanks for the contribution. What's
more, how much the benefit after this modification for `ArrayType`? Is there
any statistics?
In fact, my PR is optimized for scalar type, because get scalar type is used
very frequently, so the operation of getting scalar type must be light enough.
Here I use templates and static variables to optimize get scalar type, and I I
think that for the design of composite types such as arrays and maps, my
changes are only expedient. I think composite types may need to be managed with
another set of logic in future, because scalar type is the most used type by
users, so performance is very important, PR #8500 has already done some
performance
optimization work for this operation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]