kinow commented on issue #35508:
URL: https://github.com/apache/arrow/issues/35508#issuecomment-1542061091

   Hi @westonpace 
   
   I work with @kat-grayson on the same project , but on another component that 
does not use t-digest directly (so I am a bit lost, still learning the ropes 
around streaming, t-digest, etc.).
   
   >You could choose to do this completely outside of Acero using the function 
registry directly. You would end up creating something that looks quite a bit 
like the aggregate node so I'd recommend starting by looking at that and 
getting familiar.
   
   A little more context here; our project has already a streaming component, 
and Apache Arrow is not used anywhere in the project (I think, but could be 
wrong… large project).
   
   But still a good suggestion to look at the [aggregate node in 
Acero](https://arrow.apache.org/docs/cpp/streaming_execution.html#aggregate). 
From a brief look, I **think** the aggregate node [accepts 
functions](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/acero/aggregate_node.cc)
 to apply to the streaming data. One being an [aggregate function with 
tdigest](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/compute/kernels/aggregate_tdigest.cc#L81).
   
   @kat-grayson I think that aggregate calls `NanAdd` in [the 
tdigest](https://github.com/apache/arrow/blob/ec29c6ffc3cb1af4db4903d9877b2f0b548a3ad9/cpp/src/arrow/util/tdigest.h#L73).
 So the C++ code seems to support adding to an existing t-digest.
   
   But I think the Python version was made simpler, without exposing some 
functions from the C++ t-digest object.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to