imbajin commented on code in PR #83: URL: https://github.com/apache/incubator-hugegraph-ai/pull/83#discussion_r1768863931
########## hugegraph-ml/requirements.txt: ########## @@ -0,0 +1,8 @@ +dgl~=2.2.1 +numpy~=1.24.4 +torch~=2.2.0 +tqdm~=4.66.5 +packaging~=24.1 +torchdata~=0.7.0 +PyYAML~=6.0.2 +pydantic~=2.9.2 Review Comment: ```suggestion pydantic~=2.9.2 ``` ########## hugegraph-ml/README.md: ########## @@ -1 +1,119 @@ - \ No newline at end of file + # hugegraph-ml + +## Summary + +`hugegraph-ml` is a tool that integrates HugeGraph with popular graph learning libraries. +It implements most graph learning algorithms, enabling users to perform end-to-end graph learning workflows directly from HugeGraph using `hugegraph-ml`. +Graph data can be read directly from `HugeGraph` and used for tasks such as node embedding, node classification, and graph classification. +The implemented algorithm models can be found in the [models](./src/hugegraph_ml/models) folder. + + +## Environment Requirements + +- python 3.9+ +- hugegraph-server 1.0+ + +## Preparation + +1. Start the HugeGraph database, you can do it via Docker/[Binary packages](https://hugegraph.apache.org/docs/download/download/). +Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance +2. Clone this project + ```bash + git clone https://github.com/apache/incubator-hugegraph-ai.git + ``` +3. Install [hugegraph-python-client](../hugegraph-python-client) and [hugegraph_ml](../hugegraph-ml) + ```bash + cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) + pip install ./hugegraph-python-client + pip install -e . + ``` +4. Enter the project directory + ```bash + cd ./hugegraph-ml/src + ``` + +## Examples + +### Perform node embedding on the `Cora` dataset using the `DGI` model + +Make sure that the Cora dataset is already in your HugeGraph database. +If not, you can run the `import_graph_from_dgl` function to import the `Cora` dataset from `DGL` into +the `HugeGraph` database. +```python +from hugegraph_ml.utils.data_import_from_dgl import import_graph_from_dgl +import_graph_from_dgl("cora") +``` + +Run [dgi_example.py](./src/hugegraph_ml/examples/dgi_example.py) to view the example. +```bash +python .\hugegraph_ml\examples\dgi_example.py +``` + +The specific process is as follows: + +**1. Graph data convert** + +Convert the graph from `HugeGraph` to `DGL` format. +```python +from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL +from hugegraph_ml.models.dgi import DGI +from hugegraph_ml.models.mlp import MLPClassifier +from hugegraph_ml.tasks.node_classify import NodeClassify +from hugegraph_ml.tasks.node_embed import NodeEmbed + +hg2d = HugeGraph2DGL() +graph, graph_info = hg2d.convert_graph( + vertex_label="cora_vertex", edge_label="cora_edge", info_vertex_label="cora_info_vertex" +) +``` + +**2. Select model instance** + +```python +model = DGI(n_in_feats=graph_info["n_feat_dim"]) +``` + +**3. Train model and node embedding** + +```python +node_embed_task = NodeEmbed(graph=graph, graph_info=graph_info, model=model) +embedded_graph, graph_info = node_embed_task.train_and_embed( + add_self_loop=True, n_epochs=300, gpu=0, patience=30 +) +``` + +**4. Downstream tasks node classification using MLP** + +```python +model = MLPClassifier(n_in_feat=graph_info["n_feat_dim"], n_out_feat=graph_info["n_classes"]) +node_clf_task = NodeClassify(graph=embedded_graph, graph_info=graph_info, model=model) +node_clf_task.train(lr=1e-3, n_epochs=400, gpu=0, patience=40) +print(node_clf_task.evaluate()) +``` + +**5. Obtain the metrics** + +```text +{'accuracy': 0.82, 'loss': 0.5714246034622192} +``` + +### Perform node classification on the `Cora` dataset using the `GRAND` model. + +You can refer to the example in the [grand_example.py](./src/hugegraph_ml/examples/grand_example.py) + +```python +from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL +from hugegraph_ml.models.grand import GRAND +from hugegraph_ml.tasks.node_classify import NodeClassify +g2d = HugeGraph2DGL() Review Comment: ```suggestion from hugegraph_ml.tasks.node_classify import NodeClassify g2d = HugeGraph2DGL() ``` ########## hugegraph-ml/setup.py: ########## @@ -0,0 +1,46 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import setuptools +from pkg_resources import parse_requirements + +with open("README.md", "r", encoding="utf-8") as fh: + long_description = fh.read() + +with open("requirements.txt", encoding="utf-8") as fp: + install_requires = [str(requirement) for requirement in parse_requirements(fp)] + +setuptools.setup( + name="hugegraph-ml", + version="1.0.0", Review Comment: ```suggestion version="1.5.0", ``` maybe we need to unify the version in `hugegraph-ai` (also keep consistent with the apache release version) ########## hugegraph-ml/src/hugegraph_ml/data/hugegraph2dgl.py: ########## @@ -0,0 +1,83 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import dgl +import torch +from pyhugegraph.api.gremlin import GremlinManager +from pyhugegraph.client import PyHugeClient + + +class HugeGraph2DGL: + def __init__( + self, + ip='127.0.0.1', + port="8080", + graph='hugegraph', + user='admin', + pwd='xxx' + ): + self._client: PyHugeClient = PyHugeClient( + ip=ip, + port=port, + graph=graph, + user=user, + pwd=pwd + ) Review Comment: Lack optional `graphspace` param here (python-client has upgraded to `1.5.0`) Should update the input param ########## hugegraph-ml/src/hugegraph_ml/models/dgi.py: ########## @@ -0,0 +1,209 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Deep Graph Infomax (DGI) + +References +---------- +Paper: https://arxiv.org/abs/1809.10341 +Author's code: https://github.com/PetarV-/DGI +DGL code: https://github.com/dmlc/dgl/tree/master/examples/pytorch/dgi +""" Review Comment: Nice to see the reference comment 👍🏻 ########## hugegraph-ml/src/hugegraph_ml/__init__.py: ########## Review Comment: shall we add license header for the empty `__init__` file? Maybe we could refer other ASF [Python projects](https://github.com/apache/?q=&type=all&language=python&sort=)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
