[jira] [Created] (KAFKA-6432) Lookup indices may cause unnecessary page fault

Ying Zheng (JIRA) Sun, 07 Jan 2018 16:46:48 -0800

Ying Zheng created KAFKA-6432:
---------------------------------

             Summary: Lookup indices may cause unnecessary page fault
                 Key: KAFKA-6432
                 URL: https://issues.apache.org/jira/browse/KAFKA-6432
             Project: Kafka
          Issue Type: Improvement
          Components: core, log
            Reporter: Ying Zheng
         Attachments: Binary Search - Diagram 1.png, Binary Search - Diagram 
2.png


For each topic-partition, Kafka broker maintains two indices: one for message 
offset, one for message timestamp. By default, a new index entry is appended to 
each index for every 4KB messages. The lookup of the indices is a simple binary 
search. The indices are mmaped files, and cached by Linux page cache.

Both consumer fetch and follower fetch have to do an offset lookup, before 
accessing the actual message data. The simple binary search algorithm used for 
looking up the index is not cache friendly, and may cause page faults even on 
high QPS topic-partitions.

For example (diagram 1), when looking up an index entry in page 12, the binary 
search algorithm has to read page 0, 6, 9 and 11. After new messages are 
appended to the topic-partition, the index grows to 13 pages. Now, if the 
follower fetch request looking up the 1st index entry of page 13, the binary 
search algorithm will go to page 0, 7, 10 and 12. Among those pages, page 7 and 
10 have not been used for a long time, and may already be swapped to hard disk.

Actually, in a normal Kafka broker, all the follower fetch requests and most 
consumer fetch requests should only look up the last few entries of the index. 
We can make the index lookup more cache friendly, by searching in the last one 
or two pages of the index first. (Diagram 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6432) Lookup indices may cause unnecessary page fault

Reply via email to