[
https://issues.apache.org/jira/browse/SPARK-38648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542107#comment-17542107
]
Lee Yang commented on SPARK-38648:
----------------------------------
[~mengxr] I think that could work. FWIW, I looked into how the projects in the
"connector" (formerly "external") folder are built/published. It looks like
they're all currently scala projects that are just built as part of the main
[Build and
test|https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L684]
GitHub Actions workflow and [released/versioned along with the core spark
releases|https://github.com/apache/spark/pull/35879/files?file-filters%5B%5D=.xml&show-viewed-files=true].
We could do presumably something similar with this SPIP (with some
modifications to
[release-build.sh|https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh#L116-L128]
to publish a separate artifact to PyPI).
> SPIP: Simplified API for DL Inferencing
> ---------------------------------------
>
> Key: SPARK-38648
> URL: https://issues.apache.org/jira/browse/SPARK-38648
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 3.0.0
> Reporter: Lee Yang
> Priority: Minor
>
> h1. Background and Motivation
> The deployment of deep learning (DL) models to Spark clusters can be a point
> of friction today. DL practitioners often aren't well-versed with Spark, and
> Spark experts often aren't well-versed with the fast-changing DL frameworks.
> Currently, the deployment of trained DL models is done in a fairly ad-hoc
> manner, with each model integration usually requiring significant effort.
> To simplify this process, we propose adding an integration layer for each
> major DL framework that can introspect their respective saved models to
> more-easily integrate these models into Spark applications. You can find a
> detailed proposal here:
> [https://docs.google.com/document/d/1n7QPHVZfmQknvebZEXxzndHPV2T71aBsDnP4COQa_v0]
> h1. Goals
> - Simplify the deployment of pre-trained single-node DL models to Spark
> inference applications.
> - Follow pandas_udf for simple inference use-cases.
> - Follow Spark ML Pipelines APIs for transfer-learning use-cases.
> - Enable integrations with popular third-party DL frameworks like
> TensorFlow, PyTorch, and Huggingface.
> - Focus on PySpark, since most of the DL frameworks use Python.
> - Take advantage of built-in Spark features like GPU scheduling and Arrow
> integration.
> - Enable inference on both CPU and GPU.
> h1. Non-goals
> - DL model training.
> - Inference w/ distributed models, i.e. "model parallel" inference.
> h1. Target Personas
> - Data scientists who need to deploy DL models on Spark.
> - Developers who need to deploy DL models on Spark.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]