ivankelly commented on a change in pull request #1651: PIP-17: provide DataBlockHeader and BlockAwareSegmentInputStream implementation URL: https://github.com/apache/incubator-pulsar/pull/1651#discussion_r184606745
########## File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/s3offload/impl/BlockAwareSegmentInputStream.java ########## @@ -0,0 +1,205 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pulsar.broker.s3offload.impl; + +import static com.google.common.base.Preconditions.checkState; + +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.util.Iterator; +import java.util.concurrent.CompletableFuture; +import java.util.stream.IntStream; +import org.apache.bookkeeper.client.api.LedgerEntries; +import org.apache.bookkeeper.client.api.LedgerEntry; +import org.apache.bookkeeper.client.api.ReadHandle; +import org.apache.pulsar.broker.s3offload.DataBlockHeader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * + * The BlockAwareSegmentInputStream for each cold storage data block. + * It contains a byte buffer, which contains all the content for this data block. + * DataBlockHeader + entries(each with format[[entry_size -- int][entry_id -- long][entry_data]]) + * + */ +public class BlockAwareSegmentInputStream extends ByteArrayInputStream { Review comment: ah. I think block_len and block_entry_count come from an earlier design where we were pulling everything into memory before pushing the block to S3. They should be omitted. We don't need them on read in any case. Block length is fixed except in the last block. We store the count and length in the index in any case. On read, we can either use the index, or keep reading from the block until we find the padding pattern. The padding pattern, when read as 4 bytes, should yield a negative integer, to ensure it could never possibly be considered to be an entry_len. So, make the padding pattern something like 0xF012DEAD (i.e. F for most significant nibble). It would be no harm to have a bit of extra space in the block header if we did want to add something later. Maybe make it 64bytes or so, padded with 0 after the magit word and first entry id. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
