[ https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Leshem updated MAHOUT-379: -------------------------------- Description: When a SequentialAccessSparseVector is serialized and deserialized using VectorWritable, the result vector and the original vector are equivalent, yet equals returns false. The following unit-test reproduces the problem: {code} @Test public void testSequentialAccessSparseVectorEquals() throws Exception { final Vector v = new SequentialAccessSparseVector(1); final VectorWritable vectorWritable = new VectorWritable(v); final VectorWritable vectorWritable2 = new VectorWritable(); writeAndRead(vectorWritable, vectorWritable2); final Vector v2 = vectorWritable2.get(); assertTrue(AbstractVector.equivalent(v, v2)); assertEquals(v, v2); // This line fails! } private void writeAndRead(Writable toWrite, Writable toRead) throws IOException { final ByteArrayOutputStream baos = new ByteArrayOutputStream(); final DataOutputStream dos = new DataOutputStream(baos); toWrite.write(dos); final ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); final DataInputStream dis = new DataInputStream(bais); toRead.readFields(dis); } {code} The problem seems to be that the original vector name is null, while the new vector's name is an empty string. The same issue probably also happens with RandomAccessSparseVector. SequentialAccessSparseVectorWritable (line 40): {code} dataOutput.writeUTF(getName() == null ? "" : getName()); {code} RandomAccessSparseVectorWritable (line 42): {code} dataOutput.writeUTF(this.getName() == null ? "" : this.getName()); {code} The simplest fix is probably to change the default Vector's name from null to the empty string. was: When a SequentialAccessSparseVector is serialized and deserialized using VectorWritable, the result vector and the original vector are equivalent, yet equals returns false. The following unit-test reproduces the problem: {code} @Test public void testSequentialAccessSparseVectorEquals() throws Exception { final Vector v = new SequentialAccessSparseVector(1); final VectorWritable vectorWritable = new VectorWritable(v); final VectorWritable vectorWritable2 = new VectorWritable(); writeAndRead(vectorWritable, vectorWritable2); final Vector v2 = vectorWritable2.get(); assertTrue(AbstractVector.equivalent(v, v2)); assertEquals(v, v2); // This line fails! } private void writeAndRead(Writable toWrite, Writable toRead) throws IOException { final ByteArrayOutputStream baos = new ByteArrayOutputStream(); final DataOutputStream dos = new DataOutputStream(baos); toWrite.write(dos); final ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); final DataInputStream dis = new DataInputStream(bais); toRead.readFields(dis); } {code} The problem seems to be that the original vector name is null, while the new vector's name is an empty string. > SequentialAccessSparseVector.equals does not agree with > AbstractVector.equivalent > --------------------------------------------------------------------------------- > > Key: MAHOUT-379 > URL: https://issues.apache.org/jira/browse/MAHOUT-379 > Project: Mahout > Issue Type: Bug > Components: Math > Affects Versions: 0.4 > Reporter: Danny Leshem > Priority: Minor > Fix For: 0.3 > > > When a SequentialAccessSparseVector is serialized and deserialized using > VectorWritable, the result vector and the original vector are equivalent, yet > equals returns false. > The following unit-test reproduces the problem: > {code} > @Test > public void testSequentialAccessSparseVectorEquals() throws Exception { > final Vector v = new SequentialAccessSparseVector(1); > final VectorWritable vectorWritable = new VectorWritable(v); > final VectorWritable vectorWritable2 = new VectorWritable(); > writeAndRead(vectorWritable, vectorWritable2); > final Vector v2 = vectorWritable2.get(); > assertTrue(AbstractVector.equivalent(v, v2)); > assertEquals(v, v2); // This line fails! > } > private void writeAndRead(Writable toWrite, Writable toRead) throws > IOException { > final ByteArrayOutputStream baos = new ByteArrayOutputStream(); > final DataOutputStream dos = new DataOutputStream(baos); > toWrite.write(dos); > final ByteArrayInputStream bais = new > ByteArrayInputStream(baos.toByteArray()); > final DataInputStream dis = new DataInputStream(bais); > toRead.readFields(dis); > } > {code} > The problem seems to be that the original vector name is null, while the new > vector's name is an empty string. The same issue probably also happens with > RandomAccessSparseVector. > SequentialAccessSparseVectorWritable (line 40): > {code} > dataOutput.writeUTF(getName() == null ? "" : getName()); > {code} > RandomAccessSparseVectorWritable (line 42): > {code} > dataOutput.writeUTF(this.getName() == null ? "" : this.getName()); > {code} > The simplest fix is probably to change the default Vector's name from null to > the empty string. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira