+1 maintain it in rel/0.12 branch
CloudWiseluke.miao
------------------ ???????? ------------------
??????:
"dev"
<[email protected]>;
????????: 2021??11??25??(??????) ????9:12
??????: "[email protected]"<[email protected]>;
????: Re: About making an iterable cluster version in enterprise
+1?? and I suggest to maintain it in rel/0.12 branch for it's stable now.
Thanks!
Chao Wang
BONC ltd
[email protected]
On 11/25/2021 09:03??Houliang Qi<[email protected]> wrote??
Hi??
We also have some similar situations in the test environment. We very much
support creating a stable and available version based on the current version.
Thanks,
---------------------------------------
Houliang Qi
BONC, Ltd
On 11/24/2021 21:29??Xiangdong Huang<[email protected]> wrote??
Hi,
I think it is fine to keep maintaining the current cluster module,
either on rel/0.12 and master branch.
Look forward to progress.
Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University
??????
???????? ????????
?????? <[email protected]> ??2021??11??24?????? ????5:38??????
Hi, all
After running cluster version with online stream about two weeks, we
experienced two times of failures that cluster is no response and can't recover
by restarting. And we didn't find an effective way to recover data from
cluster. So we'd like to make testable cluster version in enterprise which
should have the properties:
1. Write operation won??t be blocked frequently.
2. Query bugs are tolerant as it could be fixed and iterate quickly.
3. Most of issues could be resolve by restart nodes or cluster.
4. Exist a solution to solve the unrecoverable issue after lose small
part of data.
5. Cluster restart could complete in a proper time.
6. System has monitor mechanism.
We??re planning improve from below aspects:
1. Meta data use too much memory
In our scenario, the measurement scale is large which would be around 1 billion
but we have small data point ingestion (100K per second). We found the cluster
node can??t afford the metadata storage as memory limitation(each nodes has
256G memory). As the small data point request rate, the CPU load is only
about 1% ~ 2%. For the scenario, we intended to import some 3rd party storage
component like RocksDB to help manage schema meta data. Of course, this would
be optional and can be configured.
1. Raft implementation
For this one, we planned to make it two steps. First, we??d like to abstract
the interfaces of Raft, try to make Raft as a independent component. This
should also be one work item when implement new architecture. Second, we??d
like to import some 3rd party Raft library like Ratis and make it configurable
ideally.
1. Engineering components
Cluster missed some components like monitor system(this one should be working
in progress by community, we??d like to help if needed), migration single node
data into cluster which would help migrate single node to cluster and tools to
help do failure recovery. We need to make these tools to make the system
observable and recoverable.
1. Test
As new test architecture is importing into community, we would try to
complement test cases under new architecture.
Most of the solutions above are not investigate deeply, any idea is welcomed.
What??s the benefit of the work?
We intend to make the version run on production so that we can collect
feedback/bugs from real user and iterate by that. And finally become a baseline
of stable cluster version.
Why won??t make it in new architecture?
We don??t do this under new architecture because the new architecture just
started planning and we can??t wait anymore. And nearly all of the work
doesn??t conflict with new architecture and could be usable in new architecture.
Please feel free to reply the email to discussion if you have any concern or
idea.
Welcome to discuss if you have any concern.
----------------------------------------------------------
Thanks!
Jianyun Cheng