hudi-bot opened a new issue, #15515:
URL: https://github.com/apache/hudi/issues/15515

   Originally reported by the user: 
   [https://github.com/apache/hudi/issues/6137]
   
    
   
   Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, 
and in that case `FileStatusCache` API is very clearly divergent b/w the two. 
   
   There are a few approaches we can take: 
    # Avoid reliance on Spark's FIleStatusCache implementation altogether and 
rely on our own one
    # Apply more staggered approach where we first try to use Spark's 
FileStatusCache and if it doesn't match expected API, we fallback to our own 
impl
   
    
   
   Approach # 1  would actually mean that we're not sharing cache 
implementation w/ Spark, which in turn would entail that in some cases we might 
be keeping 2 instances of the same cache. Approach # 2 remediates that and 
allows us to only fallback in case API is not compatible. 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5092
   - Type: Bug
   - Affects version(s):
     - 0.12.0
   - Fix version(s):
     - 1.1.0
   - Attachment(s):
     - 24/Jan/23 20:02;guoyihua;image 
(1).png;https://issues.apache.org/jira/secure/attachment/13054793/image+%281%29.png
     - 24/Jan/23 
20:02;guoyihua;image.png;https://issues.apache.org/jira/secure/attachment/13054792/image.png
   
   
   ---
   
   
   ## Comments
   
   24/Jan/23 20:02;guoyihua;After HUDI-5104, with 
{{hoodie.file.index.enable=false}} , Spark datasource read with base path still 
does not work as per user, but it works with glob paths (0.12.2 Hudi and 
Databricks 11.3 (spark 3.3)).;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to