Summary This pull request is open for 10 days, and I worked on this 2 weeks ago. Thanks everyone to help making this PR and tests work now.
Here are the list of changes we made in this pull request. Top 3 are the principle changes, others are following these principles. 1. New agent and mesh report protocol. 2. New agent header protocol. 3. Service register, instance register and network address register have been removed permanently. 4. Service traffic, instance traffic and network alias metrics are added to replace the service, instance and network address inventory. 5. Register process has been removed. 6. Metrics stream process supports insert only mode, especially for traffic entities. 7. Metrics stream process supports no-downsampling mode for traffic entities and network alias. 8. Remove all register mechanism and cache in the java agent. 9. Remove MONTH step in GraphQL query. 10. Update UI to remove MONTH step query, the max query range is 60 days now. 11. Simplify the TTL to metrics and record. And the unit has been formatted in Day unit. No specific TTL for ElasticSearch storage. 12. Buffer mechanism of trace receiver and mesh receiver has been removed due to no register. 13. New service id, instance id and endpoint id rules, including service relation, instance relation and endpoint relation id rules. 14. Java agent support keep tracing mode, meaning, agent generating tracing context even the backend is unconnected/unavailable. 15. Plugin test tool up to date, in order to support new protocol. 16. Plugin tests expected data files updated. 17. E2E tests updated. 18. [TBD] InfluxDB storage implementation is not available, need @dmsolr <https://github.com/dmsolr> to fix later, in order to reduce the master change block by this PR. If anyone plans to review the codes in the blocking mode(meaning don't merge the PR), please let me know. Otherwise, I will try to merge this tomorrow due to unblock the new changes in the master branch. Han Liu [email protected] <[email protected]> I just want to wait for your alarm test result, due to e2e doesn't include this. Sheng Wu 吴晟 Twitter, wusheng1108 kezhenxu94@apache <[email protected]> 于2020年4月9日周四 下午7:32写道: > The E2E should be fixed according the new codes now, good luck > > > > GitHub @kezhenxu94 > Apache SkyWalking, Apache Dubbo > > > On Apr 6, 2020, at 20:45, Sheng Wu <[email protected]> wrote: > > > > Zhenxu Ke > > My PR is ready locally, but e2e seems still failing. Please help with > > locating what are the issues. > > > > Haochao Zhuang > > I noticed you have upgraded the test tool for v3 protocol. Please move on > > making the plugin tests passed in the v8-core branch. > > > > Sheng Wu 吴晟 > > Twitter, wusheng1108 > > > > > > Sheng Wu <[email protected]> 于2020年3月31日周二 上午11:29写道: > > > >> Hi Dev Team > >> > >> After the experiences of removing endpoint_inventory, I found out this > >> strategy is successful. > >> Especially, we totally get rid of register, so I want to do more. > >> > >> *SkyWalking 8.0.0* > >> First, it is already unexpected for me, we have to move to 8.0.0 so > >> quickly, but after the discussion with +高洪涛@skywalking > >> <[email protected]>, and thinking about this for several days, I > think > >> we have to. > >> > >> The key chances are following > >> 1. Remove service, service instance, and network address register. The > old > >> register protocols are totally going to be removed. > >> 2. The agent doesn't need to do register anymore. Service name and > Service > >> Instance name are generated by the agent itself, but the extra > information, > >> such as IP, hostname, language, should report to backend separately. > >> 3. Service Traffic should be added just like the endpoint traffic but > keep > >> the time bucket as we need accurate service name in the given duration > >> 4. Service Instance Traffic should be added too, with external > >> information, such as language, hostname. > >> 5. Trace context propagation context should be changed to accept string > in > >> service instance name, endpoint name and network address. This could > ease > >> the agent logic, but also, requires changes in all language agent and > >> plugin test tool, > >> 6. Trace report protocol requires to change too, in order to adopt the > >> string. > >> 7. e2e tests have to ignore PHP and LUA at first, and remove the 6.x > >> compatibility test(doesn't support anymore). > >> > >> The benefits we will get are > >> 1. Don't worry about the inventory(s) that has been deleted randomly by > >> end users. (We received a lot of issue reports about this) > >> 2. The upgrade could be easier erasing the whole storage and reboot the > >> new one. (Users don't feel comfortable about upgrade) > >> 3. No hot-reboot case in the agent side > >> 4. No cache of network address register information in the agent. > >> 5. No service and service instance cache in the OAP > >> 6. No register lock in the OAP > >> 7. No file buffer mechanism in the OAP too, same as no register happens. > >> > >> In my mind, I think this totally break upgrade is super meaningful and > >> will be good change. Even we break many things, they are easy to follow. > >> [email protected] <[email protected]> I think by following this, we > >> need to change the collaboration header to `sw8` :) As no 7.1.0 release > >> will happen. > >> > >> Sheng Wu 吴晟 > >> Twitter, wusheng1108 > >> > >
