[
https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978096#comment-16978096
]
Adar Dembo commented on KUDU-38:
--------------------------------
Thanks for the new suggestions. I'm thinking about them and will probably come
back with additional questions.
I wanted to follow-up on your responses to my questions about the log index. My
understanding is that:
* During regular operations, the log index is used to quickly find ops to catch
up lagging peers.
* During bootstrap, any existing log index is ignored (because it isn't
necessarily consistent). Instead, the index is rebuilt as ops are replayed such
that when bootstrap is finished, we have an index representation that should be
faithful to the consistent index state pre-crash.
bq. ... we need to make sure that all prior indexes are also synced, because
it's possible that there is a lagging peer that will still need to catch up
from a very old record.
What I'm suggesting is that we use the partially synced log index *only during
bootstrap to establish the replay start point*. After we retrieve the logical
index from the TabletMetadata (derived from the earliest anchor when we last
flushed) and use the log index to convert it into a physical segment + offset,
we ditch the log index and rebuild it during bootstrap, just as we do today. I
think you could also take that one step further by not using the log index
altogether, convert the logical index into a physical segment + offset at flush
time, and store that in the TabletMetadata.
As for backwards compatibility, I don't see how to do it with just a simple
TabletMetadata flag. AFAICT we don't zero out or truncate any existing index
chunks; we just reuse them if they exist. So there's the potential of one WAL
containing a combination of "old" index chunks with "new" (synced) ones. I
guess we could address that by blowing away all index chunks on bootstrap; not
sure if that's a good idea or not though.
> bootstrap should not replay logs that are known to be fully flushed
> -------------------------------------------------------------------
>
> Key: KUDU-38
> URL: https://issues.apache.org/jira/browse/KUDU-38
> Project: Kudu
> Issue Type: Sub-task
> Components: tablet
> Affects Versions: M3
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Labels: data-scalability, startup-time
>
> Currently the bootstrap process will process all of the log segments,
> including those that can be trivially determined to contain only durable
> edits. This makes startup unnecessarily slow.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)