weixi62961 commented on issue #5380:
URL: https://github.com/apache/kyuubi/issues/5380#issuecomment-1772294505

   @bowenliang123  cc @pan3793 
   I want to talk about current progress.  If you have any question, please 
tell me.
   
   Before writing  test cases,  I first used `curl` to test some PySpark jobs 
on Kyuubi 1.7.1. They were mainly tested in the local machine.  All these curl 
test cases were passed.
   
   1. PySpark job file is hosted in HDFS
   ```bash
   # hdfs pi.py
   curl -H "Content-Type: application/json" \
   -X POST \
   -d '{"batchType": "PYSPARK", 
"resource":"hdfs:/tmp/upload/pyspark_submit_sample/pi.py",  "name": "PySpark 
PI", "conf": {"spark.master": "local"}, "args": [10]}' \
   http://localhost:10099/api/v1/batches
   ```
   2. PySpark job file is uploaded through uploading resource
   ```bash
   # upload resource: pi.py
   curl --location --request POST 'http://localhost:10099/api/v1/batches' \
   --form 'batchRequest="{\"batchType\":\"PYSPARK\",\"name\":\"PySpark 
Pi\",\"args\":[10]}";type=application/json' \
   --form 'resourceFile=@"/localpath/to/file/pi.py"'
   ```
   3. PySpark job file depends on other modules
   ```bash
   # module dependency
   curl -H "Content-Type: application/json" \
   -X POST \
   -d 
'{"batchType":"PYSPARK","resource":"hdfs:/tmp/upload/pyspark_submit_sample/test_module_dependency.py","name":"PySpark
 Module 
Dependency","conf":{"spark.master":"local","spark.submit.pyFiles":"hdfs:/tmp/upload/pyspark_submit_sample/my_module.zip"}}'
 \
   http://localhost:10099/api/v1/batches
   ```
   
   Refer to the existting 2 test cases for Spark jars, I'll wirte two test case 
for PySpark for "POST without uploading" and "POST with uploading". 
   Is it OK? 
   
   In addition, due to the complexity of PySpark's package management, it is 
not within the scope of this topic. For more information, refer to [spark 
offical site: Python Package 
Management](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html)
   
   BTW, I have created a small project for test [pyspark submit sample] 
(https://github.com/weixi62961/pyspark_submit_sample)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to