nfsantos commented on PR #1159:
URL: https://github.com/apache/jackrabbit-oak/pull/1159#issuecomment-1766166956
I rerun the tests with 20 properties per node and with larger values in the
property, which I think is a more realistic scenario. The sorted version is
slower, around 15% slower.
I think the impact in performance will be measurable but maybe not very
significant. In the case of the Pipelined strategy, the overhead from sorting
will happen in the transform threads, which can be easily scaled up. The Mongo
download thread is the main bottleneck, as this stage is single threaded and
there is no good way to parallelize it, so I would resist adding overhead in
the work done by this thread, but I don't have objections to slightly
increasing the work of the transform threads.
I would anyway suggest having a configuration setting to enable/disable
sorting of the properties when writing to the FFS.
```java
package org.apache.jackrabbit.oak.index.indexer.document.flatfile;
import java.util.ArrayList;
import org.apache.commons.lang3.RandomStringUtils;
import org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState;
import org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder;
import org.apache.jackrabbit.oak.spi.blob.BlobStore;
import org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore;
import org.apache.jackrabbit.oak.spi.state.NodeBuilder;
import org.apache.jackrabbit.oak.spi.state.NodeState;
import org.junit.Test;
public class MicroBenchmark {
public void test() {
BlobStore blobStore = new MemoryBlobStore();
NodeStateEntryWriter entryWriter = new
NodeStateEntryWriter(blobStore);
ArrayList<NodeState> list = new ArrayList<>();
for (int j = 0; j < 1000000; j++) {
NodeBuilder b = new MemoryNodeBuilder(EmptyNodeState.EMPTY_NODE);
for (int i = 0; i < 20; i++) {
b.setProperty("p" + i, RandomStringUtils.random(40, true,
true));
}
NodeState ns = b.getNodeState();
list.add(ns);
}
// Profiler prof = new Profiler().startCollecting();
for(int test = 0; test < 10; test++) {
long start = System.currentTimeMillis();
int len = 0;
for (NodeState ns : list) {
len += entryWriter.asJson(ns).length();
}
long time = System.currentTimeMillis() - start;
System.out.println(time + " ms; string length " + len + "
unsorted");
start = System.currentTimeMillis();
len = 0;
for (NodeState ns : list) {
len += entryWriter.asSortedJson(ns).length();
}
time = System.currentTimeMillis() - start;
System.out.println(time + " ms; string length " + len + "
sorted");
System.out.println();
}
// System.out.println(prof.getTop(10));
}
public static void main(String[] args) {
new MicroBenchmark().test();
}
}
```
```
4418 ms; string length 971000000 unsorted
2399 ms; string length 971000000 sorted
1886 ms; string length 971000000 unsorted
2156 ms; string length 971000000 sorted
1760 ms; string length 971000000 unsorted
2039 ms; string length 971000000 sorted
1667 ms; string length 971000000 unsorted
2000 ms; string length 971000000 sorted
1665 ms; string length 971000000 unsorted
2000 ms; string length 971000000 sorted
1665 ms; string length 971000000 unsorted
1999 ms; string length 971000000 sorted
1667 ms; string length 971000000 unsorted
2000 ms; string length 971000000 sorted
1664 ms; string length 971000000 unsorted
2001 ms; string length 971000000 sorted
1663 ms; string length 971000000 unsorted
1999 ms; string length 971000000 sorted
1667 ms; string length 971000000 unsorted
1999 ms; string length 971000000 sorted
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]