Re: [I] [VL] `array_size(null)` results inconsistent with vanilla spark [incubator-gluten]

via GitHub Tue, 02 Apr 2024 02:21:59 -0700


wForget commented on issue #5248:
URL: 
https://github.com/apache/incubator-gluten/issues/5248#issuecomment-2031496625


   > Hi @wForget, thanks for bringing up this issue!
   > 
   > Looks velox has a config to control the behavior. 
https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L35
   > 
   > I note Gluten sets it according to Spark's config to align with Spark's 
"Size" function. For "ArraySize" function, we expect it's always false. 
https://github.com/apache/incubator-gluten/blob/main/cpp/velox/compute/WholeStageResultIterator.cc#L482
   > 
   > For performance consideration, it may be better to directly do some 
changes in velox's size function, e.g., add support for two args 
[here](https://github.com/facebookincubator/velox/blob/main/velox/functions/sparksql/Size.cpp#L27).
 The extra arg is `legacySizeOfNull` flag. If Velox finds it is specified, it 
will use this flag and dismiss the config setting. Then on Gluten side, 
`SizeExpressionTransformer` can check whether `legacySizeOfNull` is consistent 
with Spark conf. If not, pass the flag along with the input to Velox. Does this 
make sense?
   
   @PHILO-HE  Thanks for your guidance, this makes sense to me, I will try it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [VL] `array_size(null)` results inconsistent with vanilla spark [incubator-gluten]

Reply via email to