[jira] [Updated] (HUDI-5609) Hudi table not queryable by SQL on Databricks Spark

Raymond Xu (Jira) Thu, 09 Mar 2023 17:24:06 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raymond Xu updated HUDI-5609:
-----------------------------
    Component/s: spark-sql

> Hudi table not queryable by SQL on Databricks Spark
> ---------------------------------------------------
>
>                 Key: HUDI-5609
>                 URL: https://issues.apache.org/jira/browse/HUDI-5609
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark-sql
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.13.1, 0.12.3
>
>
> Customer: I’ve tried this with 0.12.2 and still receive the same error. does 
> the table format version also need to be updated? i.e. we’re writing with 
> Hudi 0.11.1 using EMR but reading from Databricks using Hudi 0.12.2 and Spark 
> 3.3.
>  
> What have been tried so far on 0.12.2:
>  # 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/[email protected]!
>  SparkSQL
> so just tried Spark SQL and doesn’t work (different issue)
> SET hoodie.file.index.enable=false
> select count(*) from validated_sales;
> returns 0 count but no errors
> 2. 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/[email protected]!
>  when running via pyspark
> %python
> df = spark.read.format('hudi')\
> .load('s3://<bucket>/validated_sales/*/*/*')
> df.count()
> all is good with 0.12.2 Hudi and Databricks 11.3 (spark 3.3).
> 3. 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/[email protected]!
>  without the wildcard in pyspark
> %python
> df = spark.read.format('hudi')\
> .load('s3://<bucket>/validated_sales')
> df.count()
> count = 0
> 4. 
> !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/[email protected]!
>  without wildcard but with recursive option set in pyspark
> %python
> df = spark.read.format('hudi')\
> .option("recursiveFileLookup","true")\
> .load('s3://<bucket>/validated_sales')
> df.count()
> count = 250k 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5609) Hudi table not queryable by SQL on Databricks Spark

Reply via email to