[ 
https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902264#comment-17902264
 ] 

PoAn Yang commented on KAFKA-806:
---------------------------------

Hi [~chia7712], thanks for the information. I try to add following test to 
LogSegmentTest. If there are many batches in a MemoryRecords, the 
`LogSegment#read` may cost up to 100ms. I think we can improve this part. May I 
take the issue? Thanks.


{code:java}
    @Test
    public void testIndex() throws IOException {
        int recordsInBatch = 100;
        int batchInMemoryRecords = 100000;
        LogSegment segment = createSegment(0, 1, Time.SYSTEM);

        ByteBuffer buffer = ByteBuffer.allocate(recordsInBatch * 
batchInMemoryRecords * 100);
        for (int j = 0; j < batchInMemoryRecords; j++) {
            MemoryRecordsBuilder builder = MemoryRecords.builder(buffer, 
Compression.NONE, TimestampType.CREATE_TIME, j * recordsInBatch);
            for (int k = 0; k < recordsInBatch; k++) {
                builder.append(-1L, "key1".getBytes(), "value1".getBytes());
            }
            builder.close();
        }
        buffer.flip();
        MemoryRecords record = MemoryRecords.readableRecords(buffer);

        segment.append(0L, RecordBatch.NO_TIMESTAMP, -1L, record);

        long startMs = System.currentTimeMillis();
        segment.read(9999999, 1);
        System.out.println("read cost: " + (System.currentTimeMillis() - 
startMs) + "ms");
    }
{code}

 

> Index may not always observe log.index.interval.bytes
> -----------------------------------------------------
>
>                 Key: KAFKA-806
>                 URL: https://issues.apache.org/jira/browse/KAFKA-806
>             Project: Kafka
>          Issue Type: Improvement
>          Components: log
>            Reporter: Jun Rao
>            Assignee: Chun-Hao Tang
>            Priority: Major
>              Labels: newbie++
>
> Currently, each log.append() will add at most 1 index entry, even when the 
> appended data is larger than log.index.interval.bytes. One potential issue is 
> that if a follower restarts after being down for a long time, it may fetch 
> data much bigger than log.index.interval.bytes at a time. This means that 
> fewer index entries are created, which can increase the fetch time from the 
> consumers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to