ninsmiracle opened a new issue, #1853:
URL: https://github.com/apache/incubator-pegasus/issues/1853
## Feature Request
**Is your feature request related to a problem? Please describe:**
When we operat and maintain pegasus,there are many user give us feedback
that they need to know the specific hash_key and sort_key and which client read
it.
There is no doubt that ,this feature incurs a significant computational
cost. As we need to add relevant logic in the main flow of the read operation,
we have to handle it with caution. We should be able to dynamically configure
the sampling rate, such as 1/10000. This means that on average, only one out of
every 10000 reads will be recorded in Detail LOG.
This feature will help some users better understand which of their data
has been read and which data is redundant.They may no need to write unecessary
data any more. This will help them reduce the amount of online write traffic
and the storage capacity of Pegasus.
In addition, we can also configure a threshold. When the size of a key or
value is greater than this threshold, the key-value pair will be recorded. This
will help us notify users to improve their data in order to achieve better read
performance.
**Describe the feature you'd like:**
<!-- A clear and concise description of what you want to happen. -->
In my opinion,there are 5 parameters that we could config:
- Log path
The path of detail log file can be configured by the user. These logs are
independent of the main path currently used by Pegasus.
- Sampling function switch
When we don't need to use this feature, it should be possible to dynamically
turn it off.
- Sampling Rate
Set a certain sampling rate. Each time a get or multi_get operation is
performed, there is a certain probability of being recorded, instead of logging
every time.
- Filter size
A built-in bloom filter is included, mainly to reduce the size of the
generated special logs. The size of the filter can be configured to prevent
excessive memory usage.
- sampling status check time
Periodically check if there are any changes in the sampling status.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]