[GitHub] [hudi] BBency commented on issue #9094: Async Clustering failing with errors for MOR table

via GitHub Tue, 04 Jul 2023 06:38:44 -0700


BBency commented on issue #9094:
URL: https://github.com/apache/hudi/issues/9094#issuecomment-1620270322


   @ad1happy2go  Hi Aditya,
   I triggered the run again today. Sharing the timestamp and the error 
message. Let me know if you would need more details.
   ### Approach 1 Error
   **2023-07-04 13:33:04,217** ERROR [task-result-getter-2] 
scheduler.TaskSetManager (Logging.scala:logError(77)): task 0.0 in stage 23.0 
(TID 183) had a not serializable result: 
org.apache.avro.generic.GenericData$Record
   Serialization stack:
        - object not serializable (class: 
org.apache.avro.generic.GenericData$Record, value: 
   ### Approach 2 Error 
   **2023-07-04 13:20:38,915** ERROR [main] glue.ProcessLauncher 
(Logging.scala:logError(77)): Error from Python:Traceback (most recent call 
last):
     File "/tmp/eec-aws-uk-ukidcibatchanalytics-hudi-clustering-job.py", line 
54, in <module>
       main()
     File "/tmp/eec-aws-uk-ukidcibatchanalytics-hudi-clustering-job.py", line 
47, in main
       spark_df_run_clustering = spark.sql(query_run_clustering)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/session.py", 
line 1034, in sql
       return DataFrame(self._jsparkSession.sql(sqlQuery), self)
     File 
"/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 
1321, in __call__
       return_value = get_return_value(
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
190, in deco
       return f(*a, **kw)
     File 
"/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 
326, in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o97.sql.
   : org.apache.hudi.exception.HoodieClusteringException: Clustering failed to 
write to 
files:61b699e1-9a0b-4a23-8102-f66ab5b46fc8-0,3c2c597d-f807-4c71-aede-7b7c8b95ef19-0,69751dea-eddb-4cc5-97da-507ae885512b-0,ab443ddc-bfad-49a8-aadc-be47150f3c43-0,d9e692e6-cdb1-47d9-b7dd-63e288f42c44-0,0597f0dc-f9dd-4e17-a030-098eaa70860e-0,026645e2-9c06-4804-ac60-13dd9cde28ee-0,a61b0dc0-3762-4dc0-9379-53d8f36d988c-0,82107955-04a4-43dd-b62a-3e89860c2924-0,7526b140-d4b2-4e48-8f49-0df8018feddf-0,83fb889e-bf1d-43ce-bdb9-f68b9d4ee43a-0,d989f110-fbd0-4882-ad67-ca16f1179681-0,127c70f3-ba55-4dbc-8fe9-69c14c857e10-0,0a0b5c3b-40f8-4ff3-aac2-05d575eecd3b-0,a592daf4-5b43-42e8-a85b-f82b318cb76a-0,2dadec1c-520e-4c18-9e01-c4d4de3c42e4-0,f918e9f2-4254-49ce-9026-459545075a6c-0,b4be2ac2-8238-475f-9e4c-736a778299f1-0,6ec7b26a-c5d9-43a0-82e3-487d1a440565-0,ddf44fbd-b537-463b-af28-cc4f45e9f447-0,1cfbd8bd-06c8-4b77-9d7a-52efa2dc59a0-0,19934523-c4ea-4285-acd2-b9077dd0f028-0,8efd2ea5-d96b-4010-9aa6-beeaa0d0026f-0,1b82307
 
e-a481-4f67-abfd-271f8fc700d7-0,ad7f46dd-8ca4-4cb4-8fde-add11565c58c-0,63fd0f37-4c8c-439d-a7eb-571786c9d88c-0,a9abb2e6-7d50-4b80-baf3-4fb23b130741-0,f3a19847-e1ea-49bf-99b1-aaec2b0a21b0-0,ce89f9cf-86a6-46fd-b1b4-327fed85d8c4-0,e38269b8-8d55-4cc5-a9be-3edf156cc81a-0
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:381)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] BBency commented on issue #9094: Async Clustering failing with errors for MOR table

Reply via email to