thomasmueller commented on PR #1159:
URL: https://github.com/apache/jackrabbit-oak/pull/1159#issuecomment-1766130468
Micro-benchmark and results:
```
package org.apache.jackrabbit.oak.index.indexer.document.flatfile;
import java.util.ArrayList;
import org.apache.jackrabbit.oak.commons.Profiler;
import org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState;
import org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder;
import org.apache.jackrabbit.oak.spi.blob.BlobStore;
import org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore;
import org.apache.jackrabbit.oak.spi.state.NodeBuilder;
import org.apache.jackrabbit.oak.spi.state.NodeState;
import org.junit.Test;
public class MicroBench {
@Test
public void test() {
BlobStore blobStore = new MemoryBlobStore();
NodeStateEntryWriter entryWriter = new
NodeStateEntryWriter(blobStore);
ArrayList<NodeState> list = new ArrayList<>();
for (int j = 0; j < 1000000; j++) {
NodeBuilder b = new MemoryNodeBuilder(EmptyNodeState.EMPTY_NODE);
for (int i = 0; i < 5; i++) {
b.setProperty("p" + i, "Hello, World");
}
NodeState ns = b.getNodeState();
list.add(ns);
}
// Profiler prof = new Profiler().startCollecting();
for(int test = 0; test < 10; test++) {
long start = System.currentTimeMillis();
int len = 0;
for (NodeState ns : list) {
len += entryWriter.asUnsortedJson(ns).length();
}
long time = System.currentTimeMillis() - start;
System.out.println(time + " ms; string length " + len + "
unsorted");
start = System.currentTimeMillis();
len = 0;
for (NodeState ns : list) {
len += entryWriter.asJson(ns).length();
}
time = System.currentTimeMillis() - start;
System.out.println(time + " ms; string length " + len + "
sorted");
System.out.println();
}
// System.out.println(prof.getTop(10));
}
}
1069 ms; string length 101000000 unsorted
610 ms; string length 101000000 sorted
401 ms; string length 101000000 unsorted
425 ms; string length 101000000 sorted
439 ms; string length 101000000 unsorted
612 ms; string length 101000000 sorted
411 ms; string length 101000000 unsorted
514 ms; string length 101000000 sorted
356 ms; string length 101000000 unsorted
428 ms; string length 101000000 sorted
362 ms; string length 101000000 unsorted
371 ms; string length 101000000 sorted
417 ms; string length 101000000 unsorted
380 ms; string length 101000000 sorted
387 ms; string length 101000000 unsorted
381 ms; string length 101000000 sorted
355 ms; string length 101000000 unsorted
542 ms; string length 101000000 sorted
611 ms; string length 101000000 unsorted
621 ms; string length 101000000 sorted
```
It is initially slower due to JVM warmup.
Afterwards, sorting is a bit slower, but not much.
For 100 MB of data, sorting overhead is about 0.1 seconds at most, which is
insignificant in my view. (If we want to speed up things, we should not use
JSON.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]