[ 
https://issues.apache.org/jira/browse/SPARK-28149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Luis Pedrosa updated SPARK-28149:
--------------------------------------
    Description: 
By default JVM caches the failures for the DNS resolutions, by default is 
cached by 10 seconds.

Alpine JDK used in the images for kubernetes has a default timout of 5 seconds.

This means that in clusters with slow init time (network sidecar pods, slow 
network start up) executor will never run, because the first attempt to connect 
to the driver will fail, and that failure will be cached, causing  the retries 
to happen in a tight loop without actually trying again.

 

The proposed implementation would be to add to the entrypoint.sh (that is 
exclusive for k8s) to alter the file with the dns caching, and disable it if 
there's an environment variable as "DISABLE_DNS_NEGATIVE_CACHING" defined. 

 

  was:
By default JVM caches the failures for the DNS resolutions, by default is 
cached by 10 seconds.

Alpine JDK used in the images for kubernetes has a default timout of 5 seconds.

This means that in clusters with slow init time (network sidecar pods, slow 
network start up) executor will never run, because the first attempt to connect 
to the driver will fail, and that failure will be cached, causing  the retries 
to happen in a tight loop without actually trying again.

 

The proposed implementation would be to add to the entrypoint.sh (that is 
exclusive for k8s) to alter the file with the dns caching, and disable it if 
there's an environment variable as "DISABLE_DNS_NEGATIVE_CAHING" defined. 

 


> Disable negeative DNS caching
> -----------------------------
>
>                 Key: SPARK-28149
>                 URL: https://issues.apache.org/jira/browse/SPARK-28149
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.4.3
>            Reporter: Jose Luis Pedrosa
>            Priority: Minor
>
> By default JVM caches the failures for the DNS resolutions, by default is 
> cached by 10 seconds.
> Alpine JDK used in the images for kubernetes has a default timout of 5 
> seconds.
> This means that in clusters with slow init time (network sidecar pods, slow 
> network start up) executor will never run, because the first attempt to 
> connect to the driver will fail, and that failure will be cached, causing  
> the retries to happen in a tight loop without actually trying again.
>  
> The proposed implementation would be to add to the entrypoint.sh (that is 
> exclusive for k8s) to alter the file with the dns caching, and disable it if 
> there's an environment variable as "DISABLE_DNS_NEGATIVE_CACHING" defined. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to