gtwuser opened a new issue, #5891:
URL: https://github.com/apache/hudi/issues/5891

   **Describe the problem you faced**
   Error found while reading data written using Hudi in a S3 prefix. 
   A clear and concise description of the problem.
   We are writing data to S3 using AWS glue with Hudi libraries. The issue is 
when we try to `read` the already `hudi` written data from this S3 prefix its 
erroring out.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write some data via AWS Glue Job script to S3 using Hudi apis and 
libraries.
   2. Try to read them back using DynamicFrame class with code as mentioned 
below
   3. It will be erroring out with failure 
   ```bash
   import sys
   from awsglue.transforms import Join
   from awsglue.utils import getResolvedOptions
   from pyspark.context import SparkContext
   from awsglue.context import GlueContext
   from awsglue.job import Job
   
   glueContext = GlueContext(SparkContext.getOrCreate())
   
   data = 
glueContext.create_dynamic_frame.from_catalog(database="ingestion_hudi_db_392d4f60",
 table_name="ingestion_details")
   print("Count: ", data.count())
   data.printSchema()
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   We should be able to read back the Hudi written data using AWS glue 
dynamicFrame class. Just as we are able to write it.
   Sample code of how data is written to S3 in hudi format using DynamicFrame 
class
   ```bash
   
glueContext.write_dynamic_frame.from_options(frame=DynamicFrame.fromDF(inputDf, 
glueContext, "inputDf"),
                                                        
connection_type="marketplace.spark",
                                                        
connection_options=combinedConf)
   ```
   **Environment Description**
   
   * Hudi version : 0.10.0
   * Spark version : 3.1
   * Storage (HDFS/S3/GCS..) : S3
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   If we find a solution to this issue it will also resolve #5880 
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   ```
   An error occurred while calling o84.getDynamicFrame. 
s3://hudi-bucket-392d4f60/ingestion/.hoodie/20220616185638342.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 
125, 10, 125]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to