CalvinSchulze opened a new issue #8214: Data Loss at replication test URL: https://github.com/apache/incubator-druid/issues/8214 # Affected Version Druid 0.15 incubating on Ubuntu using jdk 8 Zookeeper 3.4.11 Kafka 2.12-2.2.0 ### Description I decided to test Druid's fault tolerance today and set up a little test: A logging software, which sends every second a small amount of events to Kafka, which are stream ingested into Druid. All data is supposed to be replicated on 2 historical nodes. Everything runs on 1 machine. How I tested: - I produced roughly 10kB of data and ingested it - I stopped the logging software - I waited for the data to be handled (or at least, until the web GUI shows it) - I shut down one of the historicals - I restarted the logging software - I produced another 10kB of data -> First weird behaviour. The data didn't get sent to the running historical and the tasks didn't terminate - I started the second historical again -> It took a very long time and >100 failing tasks (duration = 0:00:00, no error, no log) for the second historical to queue the missing segments -> This test made the first historical crash - I restarted the first historical -> Queue stays empty forever -> The missing 5 segments are not realtime anymore -> None of the historicals contains the missing 5 segments -> The data is apparently unavailable forever This didn't go too well, I'd say. Is this intended behaviour, or did I just do something wrong? Greetings, Calvin
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
