[jira] [Updated] (HUDI-5092) Querying Hudi table throws NoSuchMethodError in Databricks runtime

Alexey Kudinkin (Jira) Wed, 26 Oct 2022 17:52:06 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Kudinkin updated HUDI-5092:
----------------------------------
    Description: 
Originally reported by the user: 
[https://github.com/apache/hudi/issues/6137]

 

Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, and 
in that case `FileStatusCache` API is very clearly divergent b/w the two. 

There are a few approaches we can take: 
 # Avoid reliance on Spark's FIleStatusCache implementation altogether and rely 
on our own one
 # Apply more staggered approach where we first try to use Spark's 
FileStatusCache and if it doesn't match expected API, we fallback to our own 
impl

 

Approach # 1  would actually mean that we're not sharing cache implementation 
w/ Spark, which in turn would entail that in some cases we might be keeping 2 
instances of the same cache. Approach # 2 remediates that and allows us to only 
fallback in case API is not compatible. 
 #  

  was:https://github.com/apache/hudi/issues/6137


> Querying Hudi table throws NoSuchMethodError in Databricks runtime 
> -------------------------------------------------------------------
>
>                 Key: HUDI-5092
>                 URL: https://issues.apache.org/jira/browse/HUDI-5092
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.12.0
>            Reporter: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.12.2, 0.13.0
>
>
> Originally reported by the user: 
> [https://github.com/apache/hudi/issues/6137]
>  
> Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, 
> and in that case `FileStatusCache` API is very clearly divergent b/w the two. 
> There are a few approaches we can take: 
>  # Avoid reliance on Spark's FIleStatusCache implementation altogether and 
> rely on our own one
>  # Apply more staggered approach where we first try to use Spark's 
> FileStatusCache and if it doesn't match expected API, we fallback to our own 
> impl
>  
> Approach # 1  would actually mean that we're not sharing cache implementation 
> w/ Spark, which in turn would entail that in some cases we might be keeping 2 
> instances of the same cache. Approach # 2 remediates that and allows us to 
> only fallback in case API is not compatible. 
>  #  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5092) Querying Hudi table throws NoSuchMethodError in Databricks runtime

Reply via email to