hujs wrote:
Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?

I don't think I understand this. For a table, tablet ranges are disjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), the Key-Values that have a key starting with 'a' would only reside in one tablet and thus only on one tabletserver.

2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?

Remember that Accumulo is only dealing with bytes and has no context that, in your case, the bytes are actually stringified numbers. For example, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8, 9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf).

To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3, 35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20 tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf).

You can extend this to create more split points if necessary for "numbers", but it also applies to alphabetical data as you described earlier. Another common trick is to temporarily reduce the split threshold for your table, ingest a corpus of data until you get a desired number of split points, and then copy the current split and then them later (the split command in the shell can read the split points, one per line, from a file).

Reply via email to