weixi62961 commented on issue #5380:
URL: https://github.com/apache/kyuubi/issues/5380#issuecomment-1772294505
@bowenliang123 cc @pan3793
I want to talk about current progress. If you have any question, please
tell me.
Before writing test cases, I first used `curl` to test some PySpark jobs
on Kyuubi 1.7.1. They were mainly tested in the local machine. All these curl
test cases were passed.
1. PySpark job file is hosted in HDFS
```bash
# hdfs pi.py
curl -H "Content-Type: application/json" \
-X POST \
-d '{"batchType": "PYSPARK",
"resource":"hdfs:/tmp/upload/pyspark_submit_sample/pi.py", "name": "PySpark
PI", "conf": {"spark.master": "local"}, "args": [10]}' \
http://localhost:10099/api/v1/batches
```
2. PySpark job file is uploaded through uploading resource
```bash
# upload resource: pi.py
curl --location --request POST 'http://localhost:10099/api/v1/batches' \
--form 'batchRequest="{\"batchType\":\"PYSPARK\",\"name\":\"PySpark
Pi\",\"args\":[10]}";type=application/json' \
--form 'resourceFile=@"/localpath/to/file/pi.py"'
```
3. PySpark job file depends on other modules
```bash
# module dependency
curl -H "Content-Type: application/json" \
-X POST \
-d
'{"batchType":"PYSPARK","resource":"hdfs:/tmp/upload/pyspark_submit_sample/test_module_dependency.py","name":"PySpark
Module
Dependency","conf":{"spark.master":"local","spark.submit.pyFiles":"hdfs:/tmp/upload/pyspark_submit_sample/my_module.zip"}}'
\
http://localhost:10099/api/v1/batches
```
Refer to the existting 2 test cases for Spark jars, I'll wirte two test case
for PySpark for "POST without uploading" and "POST with uploading".
Is it OK?
In addition, due to the complexity of PySpark's package management, it is
not within the scope of this topic. For more information, refer to [spark
offical site: Python Package
Management](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html)
BTW, I have created a small project for test [pyspark submit sample]
(https://github.com/weixi62961/pyspark_submit_sample)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]