[ 
https://issues.apache.org/jira/browse/KAFKA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315536#comment-16315536
 ] 

Ted Yu commented on KAFKA-6432:
-------------------------------

In the second diagram, the last two entries are shown as warm entries.
How do we know the number of warm entries for different hardware + OS 
combinations ?

Thanks

> Lookup indices may cause unnecessary page fault
> -----------------------------------------------
>
>                 Key: KAFKA-6432
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6432
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core, log
>            Reporter: Ying Zheng
>         Attachments: Binary Search - Diagram 1.png, Binary Search - Diagram 
> 2.png
>
>
> For each topic-partition, Kafka broker maintains two indices: one for message 
> offset, one for message timestamp. By default, a new index entry is appended 
> to each index for every 4KB messages. The lookup of the indices is a simple 
> binary search. The indices are mmaped files, and cached by Linux page cache.
> Both consumer fetch and follower fetch have to do an offset lookup, before 
> accessing the actual message data. The simple binary search algorithm used 
> for looking up the index is not cache friendly, and may cause page faults 
> even on high QPS topic-partitions.
> For example (diagram 1), when looking up an index entry in page 12, the 
> binary search algorithm has to read page 0, 6, 9 and 11. After new messages 
> are appended to the topic-partition, the index grows to 13 pages. Now, if the 
> follower fetch request looking up the 1st index entry of page 13, the binary 
> search algorithm will go to page 0, 7, 10 and 12. Among those pages, page 7 
> and 10 have not been used for a long time, and may already be swapped to hard 
> disk.
> Actually, in a normal Kafka broker, all the follower fetch requests and most 
> consumer fetch requests should only look up the last few entries of the 
> index. We can make the index lookup more cache friendly, by searching in the 
> last one or two pages of the index first. (Diagram 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to