[
https://issues.apache.org/jira/browse/SYSTEMML-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15535310#comment-15535310
]
Mike Dusenberry commented on SYSTEMML-995:
------------------------------------------
Also, the other temporary update I did to get past the vector conversion was to
replace {{MLContextConversionUtil.dataFrameToFrameObject}} with the following:
{code}
public static FrameObject dataFrameToFrameObject(String variableName, DataFrame
dataFrame,
FrameMetadata frameMetadata) {
try {
if (frameMetadata == null) {
frameMetadata = new FrameMetadata();
}
JavaSparkContext javaSparkContext = MLContextUtil
.getJavaSparkContext((MLContext)
MLContextProxy.getActiveMLContextForAPI());
boolean containsID = true;
//isDataFrameWithIDColumn(frameMetadata);
MatrixCharacteristics mc =
frameMetadata.asMatrixCharacteristics();
if (mc == null) {
mc = new MatrixCharacteristics();
// long rows = dataFrame.count();
// int cols = dataFrame.columns().length -
((containsID)?1:0);
// mc.setDimension(rows, cols);
int colVect = -1;
//FrameRDDConverterUtils.getColVectFromDFSchema(dataFrame.schema(), containsID);
int off = containsID ? 1 : 0;
for( int i=off;
i<dataFrame.schema().fields().length; i++ ) {
StructField structType =
dataFrame.schema().apply(i);
if(structType.dataType() instanceof
VectorUDT)
colVect = i-off;
}
long rlen = dataFrame.count();
long clen = dataFrame.columns().length - off +
((colVect >= 0) ?
((Vector)dataFrame.first().get(off+colVect)).size() - 1 : 0);
mc.set(rlen, clen, mc.getRowsPerBlock(),
mc.getColsPerBlock(), -1);
frameMetadata.setMatrixCharacteristics(mc);
}
String[] colnames = new String[(int)mc.getCols()];
ValueType[] fschema = new ValueType[(int)mc.getCols()];
FrameRDDConverterUtils.convertDFSchemaToFrameSchema(dataFrame.schema(),
colnames, fschema, containsID);
frameMetadata.setFrameSchema(new
FrameSchema(Arrays.asList(fschema)));
JavaPairRDD<Long, FrameBlock> binaryBlock =
FrameRDDConverterUtils.dataFrameToBinaryBlock(javaSparkContext,
dataFrame, mc, containsID);
return
MLContextConversionUtil.binaryBlocksToFrameObject(variableName, binaryBlock,
frameMetadata);
} catch (DMLRuntimeException e) {
throw new MLContextException("Exception converting
DataFrame to FrameObject", e);
}
}
{code}
> MLContext dataframe-frame conversion with index column
> ------------------------------------------------------
>
> Key: SYSTEMML-995
> URL: https://issues.apache.org/jira/browse/SYSTEMML-995
> Project: SystemML
> Issue Type: Bug
> Components: APIs
> Affects Versions: SystemML 0.11
> Reporter: Matthias Boehm
> Priority: Blocker
>
> MLContext currently always assumes data frame to frame conversion without
> existing index column. Since the user cannot communicate the existence of
> this column, the data conversion leads to incorrect results as an additional
> column is included in the output frame. We need make the MLContext handling
> of frames consistent with the handling of matrices.
> Thanks [[email protected]] for catching this issue. cc [~acs_s] [~deron]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)