Re: Write or Ingest bottleneck

Josh Elser Tue, 06 Dec 2016 10:11:30 -0800


hujs wrote:

Hello, I asked a few questions,
1, suppose I insert data into the 'a' table, each tserver in the cluster has
at least one 'a' table of tablets, I use letters such as j, k as the split
point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
D ingest rate only can reach 50k, D tserver will affect the cluster ingest
performance?

I don't think I understand this. For a table, tablet ranges aredisjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), theKey-Values that have a key starting with 'a' would only reside in onetablet and thus only on one tabletserver.

2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
do?

Remember that Accumulo is only dealing with bytes and has no contextthat, in your case, the bytes are actually stringified numbers. Forexample, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8,9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf).

To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3,35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf).

You can extend this to create more split points if necessary for"numbers", but it also applies to alphabetical data as you describedearlier. Another common trick is to temporarily reduce the splitthreshold for your table, ingest a corpus of data until you get adesired number of split points, and then copy the current split and thenthem later (the split command in the shell can read the split points,one per line, from a file).

Re: Write or Ingest bottleneck

Reply via email to