leerho commented on issue #699:
URL:
https://github.com/apache/datasketches-java/issues/699#issuecomment-3672854020
@cboumalh,
First, I don't appreciate that what you submitted was not even Java, much
less even runnable from Java. So I had to translate it to Java and try to
figure out what you were trying to do. Please don't do that!
The problem you were having was in your last line. The default
ArrayOfStringsSummary returned by the ArrayOfStringsSummaryFactory has a
stringArr value of null. If you look at the code you can see that all the
factory was doing was returning a new summary. So instead of using the factory
you can just supply a new summary initialized with the string array of your
choice. That fixes the null pointer exception.
The other problem with your code is that the union does nothing.
The ThetaUnion and TupleUnion operate first in the hash domain. Your
ThetaSketch is created with the hashes of a,b, and c. And your TupleSketch is
created with the same three hashes. So when you union the ThetaSketch into the
TupleSketch, the sketch sees that the all the hash values are the same so
nothing happens.
Now in the code below, uncomment line 19. This adds a new hash to the
ThetaSketch that does not exist with the TupleSketch.
Now run the code again. You will see that the ThetaSketch hashes have been
associated with a different string array by line 28.
```
package org.apache.datasketches.tuple.strings;
import org.apache.datasketches.theta.UpdatableThetaSketch;
import org.apache.datasketches.theta.UpdatableThetaSketchBuilder;
import org.apache.datasketches.tuple.TupleSketchIterator;
import org.apache.datasketches.tuple.TupleUnion;
import org.testng.annotations.Test;
public class ArrayOfStringsSummary_Issue699 {
UpdatableThetaSketch thetaSk = UpdatableThetaSketch.builder().build();
ArrayOfStringsTupleSketch tupleSk = new ArrayOfStringsTupleSketch();
TupleUnion<ArrayOfStringsSummary> union = new TupleUnion<>(new
ArrayOfStringsSummarySetOperations());
@Test
void go() {
thetaSk.update("a");
thetaSk.update("b");
thetaSk.update("c");
//thetaSk.update("d"); //line 19. uncomment
tupleSk.update("a", new String[] {"x", "y"});
tupleSk.update("b", new String[] {"z"});
tupleSk.update("c", new String[] {"x", "z"});
System.out.println("Print Summary before union");
printSummaries(tupleSk.iterator());
union.union(tupleSk);
union.union(thetaSk, new ArrayOfStringsSummary(new String[] {"u",
"v"})); //line 28
System.out.println("Print Summary after union");
printSummaries(union.getResult().iterator());
}
static void printSummaries(TupleSketchIterator<ArrayOfStringsSummary> it) {
while (it.next()) {
String[] strArr = it.getSummary().getValue();
for (String s : strArr) {
System.out.print(s + ", ");
}
System.out.println();
}
System.out.println();
}
}
```
This code is written with the new datasketches-java version 9.0.0 and Java 25
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]