PHILO-HE commented on issue #5248: URL: https://github.com/apache/incubator-gluten/issues/5248#issuecomment-2031165636
Hi @wForget, thanks for bringing up this issue! Looks velox has a config to control the behavior. https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L35 I note Gluten sets it according to Spark's config to align with Spark's "Size" function. For "ArraySize" function, we expect it's always false. https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/WholeStageResultIterator.cc#L482 For performance consideration, it may be better to directly do some changes in velox's size function, e.g., add support for two args [here](https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L27). The extra arg is `legacySizeOfNull` flag. If Velox finds it is specified, it will use this flag and dismiss the config setting. Then on Gluten side, `SizeExpressionTransformer` can check whether `legacySizeOfNull` is consistent with Spark conf. If not, pass the flag along with the input to Velox. Does this make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
