This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
     new 610222cbca [SYSTEMDS-3670] TSNE PCA preprocessing
610222cbca is described below

commit 610222cbca25b76c327cb5ace780c3d0ead9e1bf
Author: Sebastian Baunsgaard <[email protected]>
AuthorDate: Tue Jan 30 19:34:33 2024 +0100

    [SYSTEMDS-3670] TSNE PCA preprocessing
    
    This commit adds a comment and example script of TSNE with PCA preprocessing
    According to Scikit Learn then PCA preprocessing reduces the dimensions
    TSNE has to work with and, therefore, improve performance.
    
    LDE Project Part 1 WS 2023/2024
    
    Closes #1991
---
 scripts/builtin/tSNE.dml            | 10 ++++++++++
 scripts/tutorials/tsne/pca-tsne.dml | 38 +++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/scripts/builtin/tSNE.dml b/scripts/builtin/tSNE.dml
index 131ab1013c..a28a1c1a0a 100644
--- a/scripts/builtin/tSNE.dml
+++ b/scripts/builtin/tSNE.dml
@@ -22,6 +22,16 @@
 # This function performs dimensionality reduction using tSNE algorithm based on
 # the paper: Visualizing Data using t-SNE, Maaten et. al.
 #
+# There exists a variant of t-SNE, implemented in sklearn, that first reduces 
the
+# dimenisonality of the data using PCA to reduce noise and then applies t-SNE 
for
+# further dimensionality reduction. A script of this can be found in the 
tutorials
+# folder: scripts/tutorials/tsne/pca-tsne.dml
+#
+# For direct reference and tips on choosing the dimension for the PCA 
pre-processing,
+# you can visit:
+# 
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py
+# https://lvdmaaten.github.io/tsne/
+#
 # INPUT:
 # 
-------------------------------------------------------------------------------------------
 # X              Data Matrix of shape
diff --git a/scripts/tutorials/tsne/pca-tsne.dml 
b/scripts/tutorials/tsne/pca-tsne.dml
new file mode 100644
index 0000000000..eb159f68e4
--- /dev/null
+++ b/scripts/tutorials/tsne/pca-tsne.dml
@@ -0,0 +1,38 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#
+# tSNE dimensional reduction technique with PCA pre-processing,
+# inspired from the sklearn implementation of tSNE:
+# https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html 
+
+
+# Load data
+data = read($X)
+
+# Pre-process data with PCA
+[PCA, components, centering, scalefactor] = pca(X=data, K=$k)
+
+# Do tSNE with PCA output
+Y = tSNE(X=PCA)
+
+# Save reduced dimensions
+write(Y, $Y)

Reply via email to