Re: [I] [VL] `array_size(null)` results inconsistent with vanilla spark [incubator-gluten]

via GitHub Mon, 01 Apr 2024 23:22:35 -0700


PHILO-HE commented on issue #5248:
URL: 
https://github.com/apache/incubator-gluten/issues/5248#issuecomment-2031165636


   Hi @wForget, thanks for bringing up this issue!
   
   Looks velox has a config to control the behavior. 
https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L35
   
   I note Gluten sets it according to Spark's config to align with Spark's 
"Size" function. For "ArraySize" function, we expect it's always false.
   
https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/WholeStageResultIterator.cc#L482
   
   For performance consideration, it may be better to directly do some changes 
in velox's size function, e.g., add support for two args 
[here](https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L27).
 The extra arg is  `legacySizeOfNull` flag. If Velox finds it is specified, it 
will use this flag and dismiss the config setting. Then on Gluten side, 
`SizeExpressionTransformer` can check whether `legacySizeOfNull` is consistent 
with Spark conf. If not, pass the flag along with the input to Velox. Does this 
make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [VL] `array_size(null)` results inconsistent with vanilla spark [incubator-gluten]

Reply via email to