phaniarnab commented on code in PR #1828: URL: https://github.com/apache/systemds/pull/1828#discussion_r1205726366
########## src/main/java/org/apache/sysds/runtime/transform/encode/MultiColumnEncoder.java: ########## @@ -314,11 +314,12 @@ public MatrixBlock apply(CacheBlock<?> in) { public MatrixBlock apply(CacheBlock<?> in, int k) { // domain sizes are not updated if called from transformapply boolean hasUDF = _columnEncoders.stream().anyMatch(e -> e.hasEncoder(ColumnEncoderUDF.class)); + boolean hasWE = _columnEncoders.stream().anyMatch(e -> e.hasEncoder(ColumnEncoderWordEmbedding.class)); for(ColumnEncoderComposite columnEncoder : _columnEncoders) columnEncoder.updateAllDCEncoders(); int numCols = getNumOutCols(); long estNNz = (long) in.getNumRows() * (hasUDF ? numCols : (long) in.getNumColumns()); - boolean sparse = MatrixBlock.evalSparseFormatInMemory(in.getNumRows(), numCols, estNNz) && !hasUDF; + boolean sparse = MatrixBlock.evalSparseFormatInMemory(in.getNumRows(), numCols, estNNz) && !hasUDF && !hasWE; Review Comment: Yes. For UDF encoders we forced dense as we cannot derive the sparsity of the outputs. But for embeddings, you already know the number of nonzeros. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org