Hi Remi, thank you for your quick reply! I just changed the benchmark to use:
stackFrameStream.skip(startFrame).limit(stackDepth).collect(Collectors.toList()); and it yields a small improvement. Best regards, Rafael 2017-10-20 15:54 GMT+02:00 Remi Forax <[email protected]>: > Hi Rafael, > stream.iterator() is usually super slow*, did you try with toArray() or > forEach() instead ? > > Rémi > * you want to see a push based API (Stream) as a pull based API (Iterator) > > ----- Mail original ----- > > De: "Rafael Winterhalter" <[email protected]> > > À: "core-libs-dev" <[email protected]> > > Envoyé: Vendredi 20 Octobre 2017 15:32:33 > > Objet: Decreased latency performance with Stack Walker API compared to > sun.misc.JavaLangAccess > > > Hello, > > > > a typical patern when reading the stack of the current thread in tooling > > like performance monitoring used to imply the creation of an instance of > > Throwable and to process this instance's attached stack in another > thread. > > The performance cost is shared about 10/90 for creating a new throwable > > compared to reading its frames, so this is really a worthy optimization. > > > > It is also common to use the JavaLangAccess API which offers selective > > access of single frames. > > > > This API does no longer exist as it was superseeded by the Stack Walker > API > > which is of course much safer and even a more performant alternative when > > looking at the total performance. However, using a stack walker, it is no > > longer possible to move the stack processing out of the user thread but > it > > must be done at the moment the snapshot of the stack is taken. It turns > out > > that this increases latency dramatically when processing stacks compared > to > > the asyncronous alternative. > > > > In a quick benchmark, it seems like walking 35 frames of a 100 frames > stack > > allows me 70k operations per second whereas creating a new throwable > yields > > about 200k operations per second. Also, within a less isolated test, I > can > > infer this additional overhead from the actual latency numbers of a web > > service when using the stack walker API to extract the top 35 frames > > compared to the "old" solution using JavaLangAccess. > > > > For this reason, it seems to be the best solution to avoid the stack > walker > > when aiming for latency at the moment if the stack is not required > > immediately and if access resources are available in other threads. > > > > I would therefore like to propose to extend the stack walker API to allow > > walking the stack of an existing throwable to allow for similar > performance > > as with JavaLangAccess. I understand that the VM must do more work > > altogether. When receving the full stack from a throwable, this takes > about > > three times as long. In practice, for a product I am involved in, this > > casues a noticable overhead when running a Java 9 VM compared to Java 8. > > > > Alternatively, it would of course even be better if one could take a > > snapshot of only the top x frames to walk on this object rather then a > > throwable. > > > > I have added my benchmarks (snapshot for the current user thread > operation, > > complete for the entire processing) into this Gist: > > https://gist.github.com/raphw/96e7c81d7c719cf7991b361bb7266c70 > > > > Thank you for any feedback on my finding! > > > > Best regards, Rafael >
