Superskyyy commented on PR #10884: URL: https://github.com/apache/skywalking/pull/10884#issuecomment-1579334587
Hi! I will offer two algorithm choices for future implementation references: First I will do the tree version as it's easier and quicker, expect to have a working algorithm in a week (will provide a simple web interface to test it). 1. My own tree based algorithm 2. LLM based response using [Langchain](https://github.com/hwchase17/langchain) Input is like this, output will be structured according to proto. ``` cachedHttpUris: ConcurrentHashMap | |-- Service Name (String) | |-- URI (String) : Occurrence Count (AtomicInteger) | |-- URI (String) : Occurrence Count (AtomicInteger) | |-- Service Name (String) | |-- URI (String) : Occurrence Count (AtomicInteger) | Output: Something like this { "FormattedPattern1": [URI, URI, URI...], # FormattedPattern is not regex, but logical endpoint. # Algorithm generated regex is unreliable, it will introduce ambiguity, so I don't recommend update rules with algorithm, any new uri that unhandled by openapi and user regex should be always sent to algorithm "FormattedPattern2": [URI, URI, URI...], "FormattedPattern3": [URI, URI, URI...], } ``` Considerations of the algorithm: It's going to be stateful, based on incremental trees. Meaning across different batches (at most 3k per 30mins) the algorithm will remember previously grouped uris (assumption is that it will always end up to be a finite number), so results is consistent throughout the service lifecycle. LLM based response is stateless by nature, so previous uri results will be kept in a local cache and passed back to each gpt question in later batches to simulate the stateful nature. [I have a comment on the regex pattern group added back to ruleset, please see in pr code comment] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
