phaniarnab commented on code in PR #1850:
URL: https://github.com/apache/systemds/pull/1850#discussion_r1251249620
##########
scripts/perftest/datagen/genMNISTData.sh:
##########
@@ -29,35 +29,100 @@ CMD=$1
DATADIR=$2/mnist
MAXMEM=$3
-FORMAT="text" # can be csv, mm, text, binary
+FORMAT="csv" # can be csv, mm, text, binary
echo "-- Generating MNIST data." >> results/times.txt;
#make sure whole MNIST is available
../datagen/getMNISTDataset.sh ${DATADIR}
+MNIST_train_filename="mnist_train.csv"
+MNIST_test_filename="mnist_test.csv"
+
+max_size_ordinal=4
+min_num_examples_train=12000
+max_num_examples_train=60000
+span_num_examples_train=$(echo "${max_num_examples_train} -
${min_num_examples_train}" | bc)
+min_num_examples_test=2000
+max_num_examples_test=10000
+span_num_examples_test=$(echo "${max_num_examples_test} -
${min_num_examples_test}" | bc)
#generate XS scenarios (80MB) by producing a subset of MNIST
if [ $MAXMEM -ge 80 ]; then
- echo "placeholder"
+ size_ordinal=0
+ percent_size=$(echo "${size_ordinal} / ${max_size_ordinal}" | bc)
+ target_num_train=$(python -c "from math import floor; print(
${min_num_examples_train} + floor(${span_num_examples_train} *
${percent_size}))") # todo couldn't work out how to do this using bc so using
slower python calls instead
+ target_num_test=$(python -c "from math import floor; print(
${min_num_examples_test} + floor(${span_num_examples_test} * ${percent_size}))")
Review Comment:
I recommend not to inline Python calls here. You can find another way or
push some of the logic inside the dml script.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]