jerryshao commented on code in PR #6052: URL: https://github.com/apache/gravitino/pull/6052#discussion_r1903755330
########## docs/manage-model-metadata-using-gravitino.md: ########## @@ -0,0 +1,637 @@ +--- +title: Manage model metadata using Gravitino +slug: /manage-model-metadata-using-gravitino +date: 2024-12-26 +keyword: Gravitino model metadata manage +license: This software is licensed under the Apache License version 2. +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +This page introduces how to manage model metadata in Apache Gravitino. Gravitino model catalog +is a kind of model registry, which provides the ability to manage machine learning models' +versioning metadata. It follows the typical Gravitino 3-level namespace (catalog, schema, and +model) and supports managing the versions for each model. + +Currently, it supports model and model version registering, listing, loading, and deleting. + +To use model catalog, please make sure that: + + - Gravitino server has started, and assume the host and port [http://localhost:8090](http://localhost:8090). + - A metalake has been created and [enabled](./manage-metalake-using-gravitino.md#enable-a-metalake) + +## Catalog operations + +### Create a catalog + +:::info +For a model catalog, you must specify the catalog `type` as `MODEL` when creating the catalog. +Please also be aware that the `provider` is not required for a model catalog. +::: + +You can create a catalog by sending a `POST` request to the `/api/metalakes/{metalake_name}/catalogs` +endpoint or just use the Gravitino Java/Python client. The following is an example of creating a +catalog: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "model_catalog", + "type": "MODEL", + "comment": "This is a model catalog", + "properties": { + "k1": "v1" + } +}' http://localhost:8090/api/metalakes/example/catalogs +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +GravitinoClient gravitinoClient = GravitinoClient + .builder("http://localhost:8090") + .withMetalake("example") + .build(); + +Map<String, String> properties = ImmutableMap.<String, String>builder() + .put("k1", "v1") + .build(); + +Catalog catalog = gravitinoClient.createCatalog( + "model_catalog", + Type.MODEL, + "This is a model catalog", + properties); +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") +catalog = gravitino_client.create_catalog(name="catalog", + type=Catalog.Type.MODEL, + provider=None, + comment="This is a model catalog", + properties={"k1": "v1"}) +``` + +</TabItem> +</Tabs> + +### Load a catalog + +Refer to [Load a catalog](./manage-relational-metadata-using-gravitino.md#load-a-catalog) +in relational catalog for more details. For a model catalog, the load operation is the same. + +### Alter a catalog + +Refer to [Alter a catalog](./manage-relational-metadata-using-gravitino.md#alter-a-catalog) +in relational catalog for more details. For a model catalog, the alter operation is the same. + +### Drop a catalog + +Refer to [Drop a catalog](./manage-relational-metadata-using-gravitino.md#drop-a-catalog) +in relational catalog for more details. For a model catalog, the drop operation is the same. + +### List all catalogs in a metalake + +Please refer to [List all catalogs in a metalake](./manage-relational-metadata-using-gravitino.md#list-all-catalogs-in-a-metalake) +in relational catalog for more details. For a model catalog, the list operation is the same. + +### List all catalogs' information in a metalake + +Please refer to [List all catalogs' information in a metalake](./manage-relational-metadata-using-gravitino.md#list-all-catalogs-information-in-a-metalake) +in relational catalog for more details. For a model catalog, the list operation is the same. + +## Schema operations + +`Schema` is a virtual namespace in a model catalog, which is used to organize the models. It +is similar to the concept of `schema` in the relational catalog. + +:::tip +Users should create a metalake and a catalog before creating a schema. +::: + +### Create a schema + +You can create a schema by sending a `POST` request to the `/api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas` +endpoint or just use the Gravitino Java/Python client. The following is an example of creating a +schema: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "model_schema", + "comment": "This is a model schema", + "properties": { + "k1": "v1" + } +}' http://localhost:8090/api/metalakes/example/catalogs/model_catalog/schemas +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +GravitinoClient gravitinoClient = GravitinoClient + .builder("http://localhost:8090") + .withMetalake("example") + .build(); + +Catalog catalog = gravitinoClient.loadCatalog("model_catalog"); + +SupportsSchemas supportsSchemas = catalog.asSchemas(); + +Map<String, String> schemaProperties = ImmutableMap.<String, String>builder() + .put("k1", "v1") + .build(); +Schema schema = supportsSchemas.createSchema( + "model_schema", + "This is a schema", + schemaProperties); +// ... +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") + +catalog: Catalog = gravitino_client.load_catalog(name="model_catalog") +catalog.as_schemas().create_schema(name="model_schema", + comment="This is a schema", + properties={"k1": "v1"}) +``` + +</TabItem> +</Tabs> + +### Load a schema + +Please refer to [Load a schema](./manage-relational-metadata-using-gravitino.md#load-a-schema) +in relational catalog for more details. For a model catalog, the schema load operation is the +same. + +### Alter a schema + +Please refer to [Alter a schema](./manage-relational-metadata-using-gravitino.md#alter-a-schema) +in relational catalog for more details. For a model catalog, the schema alter operation is the +same. + +### Drop a schema + +Please refer to [Drop a schema](./manage-relational-metadata-using-gravitino.md#drop-a-schema) +in relational catalog for more details. For a model catalog, the schema drop operation is the +same. + +Note that the drop operation will delete all the model metadata under this schema if `cascade` +set to `true`. + +### List all schemas under a catalog + +Please refer to [List all schemas under a catalog](./manage-relational-metadata-using-gravitino.md#list-all-schemas-under-a-catalog) +in relational catalog for more details. For a model catalog, the schema list operation is the +same. + +## Model operations + +:::tip + - Users should create a metalake, a catalog, and a schema before creating a model. +::: + +### Register a model + +You can create a model by sending a `POST` request to the `/api/metalakes/{metalake_name} +/catalogs/{catalog_name}/schemas/{schema_name}/models` endpoint or just use the Gravitino +Java/Python client. The following is an example of creating a model: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "example_model", + "comment": "This is an example model", + "properties": { + "k1": "v1" + } +}' http://localhost:8090/api/metalakes/example/catalogs/model_catalog/schemas/model_schema/models +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +GravitinoClient gravitinoClient = GravitinoClient + .builder("http://localhost:8090") + .withMetalake("example") + .build(); + +Catalog catalog = gravitinoClient.loadCatalog("model_catalog"); +Map<String, String> propertiesMap = ImmutableMap.<String, String>builder() + .put("k1", "v1") + .build(); + +Model = catalog.asModelCatalog().registerModel( + NameIdentifier.of("model_schema", "example_model"), + "This is an example model", + propertiesMap); +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") + +catalog: Catalog = gravitino_client.load_catalog(name="model_catalog") +model = catalog.as_model_catalog().register_model(ident=NameIdentifier.of("model_schema", "example_model"), + comment="This is an example model", + properties={"k1": "v1"}) +``` + +</TabItem> +</Tabs> + +### Get a model + +You can get a model by sending a `GET` request to the `/api/metalakes/{metalake_name} +/catalogs/{catalog_name}/schemas/{schema_name}/models/{model_name}` endpoint or by using the +Gravitino Java/Python client. The following is an example of getting a model: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" \ +http://localhost:8090/api/metalakes/example/catalogs/model_catalog/schemas/model_schema/models/example_model +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +// ... +Catalog catalog = gravitinoClient.loadCatalog("model_catalog"); +Model model = catalog.asModelCatalog().getModel(NameIdentifier.of("model_schema", "example_model")); +// ... +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") + +catalog: Catalog = gravitino_client.load_catalog(name="model_catalog") +model: Model = catalog.as_model_catalog().get_model(ident=NameIdentifier.of("model_schema", "example_model")) +``` + +</TabItem> +</Tabs> + +### Delete a model + +You can delete a model by sending a `DELETE` request to the `/api/metalakes/{metalake_name} +/catalogs/{catalog_name}/schemas/{schema_name}/models/{model_name}` endpoint or by using the +Gravitino Java/Python client. The following is an example of deleting a model: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" \ +http://localhost:8090/api/metalakes/example/catalogs/model_catalog/schemas/model_schema/models/example_model +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +// ... +Catalog catalog = gravitinoClient.loadCatalog("model_catalog"); +catalog.asModelCatalog().deleteModel(NameIdentifier.of("model_schema", "example_model")); +// ... +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") + +catalog: Catalog = gravitino_client.load_catalog(name="model_catalog") +catalog.as_model_catalog().delete_model(NameIdentifier.of("model_schema", "example_model")) +``` + +</TabItem> +</Tabs> + +Note that the delete operation will delete all the model versions under this model. + +### List models + +You can list all the models in a schema by sending a `GET` request to the `/api/metalakes/ +{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/moldes` endpoint or by using the +Gravitino Java/Python client. The following is an example of listing all the models in a schema: + +<Tabs groupId="language" queryString> +<TabItem value="shell" label="Shell"> + +```shell +curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" \ +http://localhost:8090/api/metalakes/example/catalogs/model_catalog/schemas/model_schema/models +``` + +</TabItem> +<TabItem value="java" label="Java"> + +```java +// ... +Catalog catalog = gravitinoClient.loadCatalog("model_catalog"); +NameIdentifier[] identifiers = catalog.asModelCatalog().listModels(Namespace.of("model_schema")); +// ... +``` + +</TabItem> +<TabItem value="python" label="Python"> + +```python +gravitino_client: GravitinoClient = GravitinoClient(uri="http://localhost:8090", metalake_name="example") + +catalog: Catalog = gravitino_client.load_catalog(name="model_catalog") +model_list = catalog.as_model_catalog().list_models(namespace=Namespace.of("model_schema"))) +``` + +</TabItem> +</Tabs> + +## Model version operations + +:::tip + - Users should create a metalake, a catalog, a schema, and a model before link a model version + to the model. +::: + +### Link a model version + +You can link a model version by sending a `POST` request to the `/api/metalakes/{metalake_name} +/catalogs/{catalog_name}/schemas/{schema_name}/models/{model_name}` endpoint or by using Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
