dpengpeng opened a new issue, #8278:
URL: https://github.com/apache/incubator-gluten/issues/8278

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   I use PySpark to execute a SQL query on Iceberg data stored on HDFS, but the 
following exception occurs, but the same SQL can be run successfully using 
Java. My cluster environment has HDFS configuration information.
   
   Error message:
   
   **py4j.protocol.Py4JJavaError: An error occurred while calling 
o156.showString.
   : java.util.concurrent.ExecutionException: org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 3.0 (TID 12) ( executor 3): 
org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: Unable to connect to HDFS: nameservice, got error: InvalidParameter: 
Cannot parse URI: hdfs://nameservice, missing port or invalid HA configuration  
   Caused by: HdfsConfigNotFound: Config key: dfs.ha.namenodes.nameservice not 
found.
   Retriable: False
   Expression: hdfsClient_ != nullptr
   Context: Split [Hive: 
hdfs://nameservice/spark/tpch_iceberg.db/supplier_ice/data/00000-116-410ef5e1-e2df-44fd-b67a-4a9410655fa1-00001.parquet
 4 - 513211] Task Gluten_Stage_3_TID_12_VTID_1
   Additional Context: Operator: TableScan[0] 0
   Function: Impl
   File: 
Gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/hdfs/HdfsFileSystem.cpp
   Line: 37**
   
   Is there any standard guidance document for using Gluten in pyspark?
   
   ### Spark version
   
   Spark-3.4.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   os: centos7
   spark: 3.4.1
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to