[ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-8042:
----------------------------------
    Description: 
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below 
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each). As can be 
seen this topology has 5 joins - which is the extent of its state.
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-0000000025-store: 851
./KSTREAM-JOINOTHER-0000000040-store: 819
./KSTREAM-JOINTHIS-0000000024-store: 851
./KSTREAM-JOINTHIS-0000000029-store: 836
./KSTREAM-JOINOTHER-0000000035-store: 819
./KSTREAM-JOINOTHER-0000000030-store: 819
./KSTREAM-JOINOTHER-0000000045-store: 745
./KSTREAM-JOINTHIS-0000000039-store: 819
./KSTREAM-JOINTHIS-0000000044-store: 685
./KSTREAM-JOINTHIS-0000000034-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-0000000025-store.1551466290000
./KSTREAM-JOINOTHER-0000000025-store.1551559020000
./KSTREAM-JOINOTHER-0000000025-store.1551492690000
./KSTREAM-JOINOTHER-0000000025-store.1551548790000
./KSTREAM-JOINOTHER-0000000025-store.1551698610000
./KSTREAM-JOINOTHER-0000000025-store.1551530640000
./KSTREAM-JOINOTHER-0000000025-store.1551484440000
./KSTREAM-JOINOTHER-0000000025-store.1551556710000
./KSTREAM-JOINOTHER-0000000025-store.1551686730000
./KSTREAM-JOINOTHER-0000000025-store.1551595650000
./KSTREAM-JOINOTHER-0000000025-store.1551757350000
./KSTREAM-JOINOTHER-0000000025-store.1551685740000
./KSTREAM-JOINOTHER-0000000025-store.1551635250000
./KSTREAM-JOINOTHER-0000000025-store.1551652410000
./KSTREAM-JOINOTHER-0000000025-store.1551466620000
./KSTREAM-JOINOTHER-0000000025-store.1551781770000
./KSTREAM-JOINOTHER-0000000025-store.1551587400000
./KSTREAM-JOINOTHER-0000000025-store.1551681450000
./KSTREAM-JOINOTHER-0000000025-store.1551662310000
./KSTREAM-JOINOTHER-0000000025-store.1551721710000
./KSTREAM-JOINOTHER-0000000025-store.1551750750000
./KSTREAM-JOINOTHER-0000000025-store.1551630960000
./KSTREAM-JOINOTHER-0000000025-store.1551615120000
./KSTREAM-JOINOTHER-0000000025-store.1551792330000
./KSTREAM-JOINOTHER-0000000025-store.1551462660000
./KSTREAM-JOINOTHER-0000000025-store.1551536910000
./KSTREAM-JOINOTHER-0000000025-store.1551592350000
./KSTREAM-JOINOTHER-0000000025-store.1551527340000
./KSTREAM-JOINOTHER-0000000025-store.1551606870000
./KSTREAM-JOINOTHER-0000000025-store.1551744150000
./KSTREAM-JOINOTHER-0000000025-store.1551508200000
./KSTREAM-JOINOTHER-0000000025-store.1551486420000
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.

  was:
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-0000000025-store: 851
./KSTREAM-JOINOTHER-0000000040-store: 819
./KSTREAM-JOINTHIS-0000000024-store: 851
./KSTREAM-JOINTHIS-0000000029-store: 836
./KSTREAM-JOINOTHER-0000000035-store: 819
./KSTREAM-JOINOTHER-0000000030-store: 819
./KSTREAM-JOINOTHER-0000000045-store: 745
./KSTREAM-JOINTHIS-0000000039-store: 819
./KSTREAM-JOINTHIS-0000000044-store: 685
./KSTREAM-JOINTHIS-0000000034-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-0000000025-store.1551466290000
./KSTREAM-JOINOTHER-0000000025-store.1551559020000
./KSTREAM-JOINOTHER-0000000025-store.1551492690000
./KSTREAM-JOINOTHER-0000000025-store.1551548790000
./KSTREAM-JOINOTHER-0000000025-store.1551698610000
./KSTREAM-JOINOTHER-0000000025-store.1551530640000
./KSTREAM-JOINOTHER-0000000025-store.1551484440000
./KSTREAM-JOINOTHER-0000000025-store.1551556710000
./KSTREAM-JOINOTHER-0000000025-store.1551686730000
./KSTREAM-JOINOTHER-0000000025-store.1551595650000
./KSTREAM-JOINOTHER-0000000025-store.1551757350000
./KSTREAM-JOINOTHER-0000000025-store.1551685740000
./KSTREAM-JOINOTHER-0000000025-store.1551635250000
./KSTREAM-JOINOTHER-0000000025-store.1551652410000
./KSTREAM-JOINOTHER-0000000025-store.1551466620000
./KSTREAM-JOINOTHER-0000000025-store.1551781770000
./KSTREAM-JOINOTHER-0000000025-store.1551587400000
./KSTREAM-JOINOTHER-0000000025-store.1551681450000
./KSTREAM-JOINOTHER-0000000025-store.1551662310000
./KSTREAM-JOINOTHER-0000000025-store.1551721710000
./KSTREAM-JOINOTHER-0000000025-store.1551750750000
./KSTREAM-JOINOTHER-0000000025-store.1551630960000
./KSTREAM-JOINOTHER-0000000025-store.1551615120000
./KSTREAM-JOINOTHER-0000000025-store.1551792330000
./KSTREAM-JOINOTHER-0000000025-store.1551462660000
./KSTREAM-JOINOTHER-0000000025-store.1551536910000
./KSTREAM-JOINOTHER-0000000025-store.1551592350000
./KSTREAM-JOINOTHER-0000000025-store.1551527340000
./KSTREAM-JOINOTHER-0000000025-store.1551606870000
./KSTREAM-JOINOTHER-0000000025-store.1551744150000
./KSTREAM-JOINOTHER-0000000025-store.1551508200000
./KSTREAM-JOINOTHER-0000000025-store.1551486420000
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.


> Kafka Streams creates many segment stores on state restore
> ----------------------------------------------------------
>
>                 Key: KAFKA-8042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8042
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Adrian McCague
>            Priority: Major
>         Attachments: StateStoreSegments-StreamsConfig.txt
>
>
> Note that this from the perspective of one instance of an application, where 
> there are 8 instances total, with partition count 8 for all topics and of 
> course stores. Standby replicas = 1.
> In the process there are multiple instances of {{KafkaStreams}} so the below 
> detail is from one of these.
> h2. Actual Behaviour
> During state restore of an application, many segment stores are created (I am 
> using MANIFEST files as a marker since they preallocate 4MB each). As can be 
> seen this topology has 5 joins - which is the extent of its state.
> {code:java}
> bash-4.2# pwd
> /data/fooapp/0_7
> bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
> ${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
> .: 8058
> ./KSTREAM-JOINOTHER-0000000025-store: 851
> ./KSTREAM-JOINOTHER-0000000040-store: 819
> ./KSTREAM-JOINTHIS-0000000024-store: 851
> ./KSTREAM-JOINTHIS-0000000029-store: 836
> ./KSTREAM-JOINOTHER-0000000035-store: 819
> ./KSTREAM-JOINOTHER-0000000030-store: 819
> ./KSTREAM-JOINOTHER-0000000045-store: 745
> ./KSTREAM-JOINTHIS-0000000039-store: 819
> ./KSTREAM-JOINTHIS-0000000044-store: 685
> ./KSTREAM-JOINTHIS-0000000034-store: 819
> There are many (x800 as above) of these segment files:
> ./KSTREAM-JOINOTHER-0000000025-store.1551466290000
> ./KSTREAM-JOINOTHER-0000000025-store.1551559020000
> ./KSTREAM-JOINOTHER-0000000025-store.1551492690000
> ./KSTREAM-JOINOTHER-0000000025-store.1551548790000
> ./KSTREAM-JOINOTHER-0000000025-store.1551698610000
> ./KSTREAM-JOINOTHER-0000000025-store.1551530640000
> ./KSTREAM-JOINOTHER-0000000025-store.1551484440000
> ./KSTREAM-JOINOTHER-0000000025-store.1551556710000
> ./KSTREAM-JOINOTHER-0000000025-store.1551686730000
> ./KSTREAM-JOINOTHER-0000000025-store.1551595650000
> ./KSTREAM-JOINOTHER-0000000025-store.1551757350000
> ./KSTREAM-JOINOTHER-0000000025-store.1551685740000
> ./KSTREAM-JOINOTHER-0000000025-store.1551635250000
> ./KSTREAM-JOINOTHER-0000000025-store.1551652410000
> ./KSTREAM-JOINOTHER-0000000025-store.1551466620000
> ./KSTREAM-JOINOTHER-0000000025-store.1551781770000
> ./KSTREAM-JOINOTHER-0000000025-store.1551587400000
> ./KSTREAM-JOINOTHER-0000000025-store.1551681450000
> ./KSTREAM-JOINOTHER-0000000025-store.1551662310000
> ./KSTREAM-JOINOTHER-0000000025-store.1551721710000
> ./KSTREAM-JOINOTHER-0000000025-store.1551750750000
> ./KSTREAM-JOINOTHER-0000000025-store.1551630960000
> ./KSTREAM-JOINOTHER-0000000025-store.1551615120000
> ./KSTREAM-JOINOTHER-0000000025-store.1551792330000
> ./KSTREAM-JOINOTHER-0000000025-store.1551462660000
> ./KSTREAM-JOINOTHER-0000000025-store.1551536910000
> ./KSTREAM-JOINOTHER-0000000025-store.1551592350000
> ./KSTREAM-JOINOTHER-0000000025-store.1551527340000
> ./KSTREAM-JOINOTHER-0000000025-store.1551606870000
> ./KSTREAM-JOINOTHER-0000000025-store.1551744150000
> ./KSTREAM-JOINOTHER-0000000025-store.1551508200000
> ./KSTREAM-JOINOTHER-0000000025-store.1551486420000
> ... etc
> {code}
> Once re-balancing and state restoration is complete - the redundant segment 
> files are deleted and the segment count drops to 508.
> We have seen the number of these segment stores grow to as many as 15000 over 
> the baseline 508 which can fill smaller volumes. *This means that a state 
> volume that would normally have ~300MB total disk usage would use in excess 
> of 30GB during rebalancing*, mostly preallocated MANIFEST files.
> h2. Expected Behaviour
> For this particular application we expect 508 segment folders total to be 
> active and existing throughout rebalancing. Give or take migrated tasks that 
> are subject to the {{state.cleanup.delay.ms}}.
> h2. Preliminary investigation
> * This does not appear to be the case in v1.1.0. With our application the 
> number of state directories only grows to 670 (over the base line 508)
> * The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
> v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the 
> many segment stores.
> * Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
> change of behaviour.
> A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
> amount for MANIFEST files to a lower value such as 64KB, however the number 
> of segment stores appears to be unbounded so disk volumes may still fill up 
> for a heavier application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to