westonpace commented on issue #35508:
URL: https://github.com/apache/arrow/issues/35508#issuecomment-1540932823
> Is it possible to alter the scale function used in the implementation? I'm
not sure which scale function you have implemented but it would be nice to have
some control over this!
I don't know enough about tdigest to answer this. Here are the current
options:
```
Help on function tdigest in module pyarrow.compute:
tdigest(array, /, q=0.5, *, delta=100, buffer_size=500, skip_nulls=True,
min_count=0, options=None, memory_pool=None)
Approximate quantiles of a numeric array with T-Digest algorithm.
By default, 0.5 quantile (median) is returned.
Nulls and NaNs are ignored.
An array of nulls is returned if there is no valid data point.
Parameters
----------
array : Array-like
Argument to compute function.
q : double or sequence of double, default 0.5
Quantiles to approximate. All values must be in [0, 1].
delta : int, default 100
Compression parameter for the T-digest algorithm.
buffer_size : int, default 500
Buffer size for the T-digest algorithm.
skip_nulls : bool, default True
Whether to skip (ignore) nulls in the input.
If False, any null in the input forces the output to null.
min_count : int, default 0
Minimum number of non-null values in the input. If the number
of non-null values is below `min_count`, the output is null.
options : pyarrow.compute.TDigestOptions, optional
Alternative way of passing options.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
If that isn't enough then you could probably open a separate issue to
request a configurable scale.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]