Re: [PR] Graph Convolutional Layers - Builtin [systemds]

via GitHub Mon, 06 Nov 2023 23:40:01 -0800


MaximilianSchreff commented on code in PR #1941:
URL: https://github.com/apache/systemds/pull/1941#discussion_r1384488393



##########
scripts/nn/layers/graph_conv.dml:
##########
@@ -0,0 +1,262 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+/*
+ * A graph convolutional layer as presented in 'Semi-Supervised Classification 
with Graph Convolutional Networks'
+ * by Kipf and Welling
+ */
+
+forward = function(matrix[double] X, matrix[double] edge_index, matrix[double] 
edge_weight,
+                   matrix[double] W, matrix[double] b, boolean add_self_loops)
+    return (matrix[double] X_out)
+{
+    /* Forward pass of the Graph Convolutional Layer. It transforms the node 
feature matrix
+     * with linear weights W and then executes the message passing according 
to the edges.
+     * The message passing is normalized by spectral normalization, i.e. for 
edge (v, w) the
+     * normalization factor is 1 / sqrt(degree(v) * degree(w)).
+     *
+     * n: number of nodes.
+     * m: number of edges.
+     * f_in: number of input features per node.
+     * f_out: number of output features per node.
+     *
+     * Inputs:
+     * - X: node features, matrix of shape (n, f_in).
+     * - edge_index: directed edge list specifying the out-node (first column) 
and the
+     *               in-node (second column) of each edge, matrix of shape (m, 
2).
+     * - edge_weight: weights of edges in edge_index, matrix of shape (m, 1).
+     *                This should be all 1s if there should be no edge weights.
+     * - W: linear weights, matrix of shape (f_in, f_out).
+     * - b: bias, matrix of shape (1, f_out).
+     * - add_self_loops: boolean that specifies whether self loops should be 
added.
+     *                   If TRUE new self loops will be added only for nodes 
that do
+     *                   not yet have a self loop. Added self loops will have 
weight 1.
+     *
+     * Outputs:
+     * - X_out: convolved and transformed node features, matrix of shape (n, 
f_out).
+     */
+    n = nrow(X)
+    m = nrow(edge_index)
+
+    # transform
+    X_hat = X %*% W

Review Comment:
   All things related to the index is for the convolution part and not the 
linear part. But, it is actually possible to do the convolution part without 
any indices at all since the formula for a graph convolutional layer is OUT = 
D^-1 * A * D^-1 * X * W + b (A: Adjacency matrix n x n, D: degree matrix n x n, 
X: input n x features, W: weights f_in x f_out). As you can see, to do the 
convolution without any indices (normalization and message passing), you need 
to do 3 extra matrix multiplications instead. These matrix multiplications are 
(in normal use cases -> n >> features) even bigger than the linear part (X*W + 
b) because D and A are bigger matrices than X and W. 
   So, since X*W + b takes 220 seconds (only the matrix multiplaction takes 
around 110 seconds), not using indices to do the normalization and convolution 
would take way longer than 340 seconds, likely around 600 seconds.
   
   This is also the reason why famous other libraries (PyTorch, TF) also mainly 
use an edge list to do the convolution part through index accessing or use a 
sparse matrix datatype (which is basically also an edge list) in the GCL 
implementation. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Graph Convolutional Layers - Builtin [systemds]

Reply via email to