[ https://issues.apache.org/jira/browse/SYSTEMML-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279418#comment-15279418 ]
Matthias Boehm commented on SYSTEMML-678: ----------------------------------------- thanks for the question [~johannes.tud]. In general, systemml provides for every operation that involves matrices (with very few exceptions) both single-node in-memory (CP) and data-parallel distributed operators (Spark/MR). If the operation (with pinned inputs/outputs) fits into the driver memory budget (70% of driver heap size), we execute this operation in single-node CP (depending on the operation, multi-threaded/single-threaded); otherwise we compile depending on data/cluster characteristics distributed operations. For Spark, operator selection is slightly different as we also transitively pull certain operations into distributed pipelines if inputs are already distributed. Task-parallel computation (with parfor assertion) complements these data-parallel operations, and can be arbitrarily combined (e.g., multi-threaded single-node execution, concurrent data parallel jobs, distributed task-parallel computation). However, except some very specific loop vectorization rewrites, we do not yet automatically identify subprograms other than parfor to execute in a task-parallel manner. Extended automatic vectorization is certainly an interesting direction and we welcome any contributions here. Now back to the actual script at hand. Even with parfor, SystemML is currently not able to run this loop in a task-parallel manner because there are loop-carried dependencies over 'sum'. By specifying the parfor parameter 'check=0' you disable dependency analysis and it runs but would produce undefined results. There are often ways to express slightly differently to workaround current shortcomings of the compiler. Feel free to post the problem at our dev list: d...@systemml.incubator.apache.org. > MLContext parallelization > ------------------------- > > Key: SYSTEMML-678 > URL: https://issues.apache.org/jira/browse/SYSTEMML-678 > Project: SystemML > Issue Type: Question > Components: Algorithms, Parser, Runtime > Affects Versions: SystemML 0.10 > Reporter: Johannes Wilke > > I try to execute script in the MLContext. It is executing, but it dont > parallel. For smaller scripts, it works fine. But this script doesnt and it > is not clear why. I think it is because of the 4 loop levels, but I am not > sure. > Is there a documentation what is parallizable and what isnt? > If I change the main while-loop, i wish to parallize, to a parfor loop it > works. > Here is the script: > X = read($Xin) > P = read($Pin) > #errorMatrix = matrix(0.0,rows=1,cols=1) > j = 1 > sum = 0 > while (j <=nrow(X) & sum >= 0){ # this should be parallelized > #parfor(j in 1: nrow(X),check=0){ > first = TRUE > windows = matrix(0,rows=1,cols=1) > offsetPreWindowDefinitions = 0 > sumWindowLength = 0 > mastercount = 0 > totalwindowLength = 0 > s = 0 > for(i in 1: nrow(P)){ > if((as.scalar(P[i,1])*as.scalar(P[i,2]))>totalwindowLength){ > totalwindowLength = > (as.scalar(P[i,1])*as.scalar(P[i,2])) > } > s = s+1 > } > lastWindow = matrix(0,rows=sum(P[,1]),cols=1) > > for(i in 1:nrow(P)){# for every Window-Definition > > for(k in 1: as.integer(as.scalar(P[i,1]))){# for every pnum > column = > matrix(0,rows=as.integer(as.scalar(P[1,4])),cols=1) > for(l in 1: nrow(column)+1){ > offsetPreWindowDefinitions = totalwindowLength > - (as.scalar(P[i,1])*as.scalar(P[i,2])) > tsindex = ((k-1) * as.scalar(P[i,2])) + l-1 + > offsetPreWindowDefinitions > if(l==nrow(column)+1){ > lastWindow[sumWindowLength+k,1] = > X[j,tsindex+1] > } else { > > column[l,1] = X[j,tsindex+1] > } > mastercount = mastercount +1 > #print(mastercount) > } > if(first){ > first = FALSE; > windows = column > } else { > windows = cbind(windows,column) > } > } > > sumWindowLength = sumWindowLength + as.scalar(P[i,1]) > } > > > result = matrix(14.3,rows=as.integer(as.scalar(P[1,4])),cols=1) > for(i in > totalwindowLength:as.integer(as.scalar(P[1,4]))+totalwindowLength-1){ > result[i-totalwindowLength+1,1] = X[j,i+1] > s = s+1 > } > params = solve(windows,result) > print(j) > predict = matrix(0,rows=1, cols=1) > for(i in 1:nrow(lastWindow)){ > predict[1,1] = predict[1,1] + (params[i,1] * lastWindow[i,1]) > s = s+1 > } > > predictscalar = as.scalar(predict[1,1]) > targetscalar = as.scalar(X[j,ncol(X)]) > sum = sum + ((targetscalar - predictscalar) * (targetscalar - > predictscalar)) > > > > j = j+1 > #write(lastWindow, > "/media/johannes/Data/Seafile/UNI/Beleg/sysml_output/lWOut.csv", > format="csv", header=TRUE, sep=",", sparse=TRUE); > #write(windows, > "/media/johannes/Data/Seafile/UNI/Beleg/sysml_output/windowsOut.csv", > format="csv", header=TRUE, sep=",", sparse=TRUE); > #write(result, > "/media/johannes/Data/Seafile/UNI/Beleg/sysml_output/resultOut.csv", > format="csv", header=TRUE, sep=",", sparse=TRUE); > } > print(sum/nrow(X)) > I hope that you can help me! -- This message was sent by Atlassian JIRA (v6.3.4#6332)