[ 
https://issues.apache.org/jira/browse/KYLIN-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingxing Di updated KYLIN-4964:
-------------------------------
    Description: 
org.apache.kylin.stream.core.storage.columnar.ColumnarMemoryStorePersister#persist
 will catch exception and just log it, this behavior will cause critical 
problem, like our case: there is no space left on device, `persist` method 
failed for many times, we lost several hours of data.

Here is our sulotion which already tested:
 # Throw IllegalStorageException while we cannot do the persist
 # Then stop the consumer thread
 # Add `consumer_thread_alive` in ConsumerStats for monitoring
 # Also fix an another issue which will cause wrong result after restore from 
checkpoint

 

  was:
org.apache.kylin.stream.core.storage.columnar.ColumnarMemoryStorePersister#persist
 will catch exception and just log it, this behavior will cause critical 
problem, like our case: there is no space left on device, `persist` method 
failed for many times, we lost several hours of data.

 


> Receiver consumer thread should be stoped while while encounting 
> unrecoverable error
> ------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4964
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4964
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.1.1
>            Reporter: Xingxing Di
>            Priority: Major
>
> org.apache.kylin.stream.core.storage.columnar.ColumnarMemoryStorePersister#persist
>  will catch exception and just log it, this behavior will cause critical 
> problem, like our case: there is no space left on device, `persist` method 
> failed for many times, we lost several hours of data.
> Here is our sulotion which already tested:
>  # Throw IllegalStorageException while we cannot do the persist
>  # Then stop the consumer thread
>  # Add `consumer_thread_alive` in ConsumerStats for monitoring
>  # Also fix an another issue which will cause wrong result after restore from 
> checkpoint
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to