[
https://issues.apache.org/jira/browse/KUDU-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adar Dembo updated KUDU-1549:
-----------------------------
Summary: LBM should start up faster (was: recovery speed of kudu-tserver
should be faster.)
I'm repurposing this JIRA for the general problem of "LBM startup is too damn
slow."
Some potential improvements:
# Identify and delete LBM containers that are full but have no live blocks.
This can happen at startup time, at last-live-block-deletion time, periodically
(perhaps via maintenance manager scheduling), or some combination of the above
# Identify LBM containers that are full and have very few live blocks.
"Defragment" the container and make it available for writing again. Probably
best to do this periodically; it may get expensive to do it at startup or when
the container becomes full.
# Compact LBM container metadata by identifying and removing CREATE/DELETE
pairs of records. Probably best to restrict this to full containers. Not sure
when it's best to do it.
> LBM should start up faster
> --------------------------
>
> Key: KUDU-1549
> URL: https://issues.apache.org/jira/browse/KUDU-1549
> Project: Kudu
> Issue Type: Improvement
> Components: tablet, tserver
> Environment: cpu: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
> mem: 252 G
> disk: single ssd 1.5 T left.
> Reporter: zhangsong
> Labels: data-scalability
> Attachments: a14844513e5243a993b2b84bf0dcec4c.short.txt
>
>
> After experiencing physical node crash, it found recovery/start speed of
> kudu-tserver is slower than that of usual restart case. There are some
> message like "Found partial trailing metadata" in kudu-tserver log and it
> seems cost more than 20 minute to recover these metadata.
> According to adar , it should be this slow.
> attachment is the start log .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)