Hi Rafael, stream.iterator() is usually super slow*, did you try with toArray() or forEach() instead ?
Rémi * you want to see a push based API (Stream) as a pull based API (Iterator) ----- Mail original ----- > De: "Rafael Winterhalter" <[email protected]> > À: "core-libs-dev" <[email protected]> > Envoyé: Vendredi 20 Octobre 2017 15:32:33 > Objet: Decreased latency performance with Stack Walker API compared to > sun.misc.JavaLangAccess > Hello, > > a typical patern when reading the stack of the current thread in tooling > like performance monitoring used to imply the creation of an instance of > Throwable and to process this instance's attached stack in another thread. > The performance cost is shared about 10/90 for creating a new throwable > compared to reading its frames, so this is really a worthy optimization. > > It is also common to use the JavaLangAccess API which offers selective > access of single frames. > > This API does no longer exist as it was superseeded by the Stack Walker API > which is of course much safer and even a more performant alternative when > looking at the total performance. However, using a stack walker, it is no > longer possible to move the stack processing out of the user thread but it > must be done at the moment the snapshot of the stack is taken. It turns out > that this increases latency dramatically when processing stacks compared to > the asyncronous alternative. > > In a quick benchmark, it seems like walking 35 frames of a 100 frames stack > allows me 70k operations per second whereas creating a new throwable yields > about 200k operations per second. Also, within a less isolated test, I can > infer this additional overhead from the actual latency numbers of a web > service when using the stack walker API to extract the top 35 frames > compared to the "old" solution using JavaLangAccess. > > For this reason, it seems to be the best solution to avoid the stack walker > when aiming for latency at the moment if the stack is not required > immediately and if access resources are available in other threads. > > I would therefore like to propose to extend the stack walker API to allow > walking the stack of an existing throwable to allow for similar performance > as with JavaLangAccess. I understand that the VM must do more work > altogether. When receving the full stack from a throwable, this takes about > three times as long. In practice, for a product I am involved in, this > casues a noticable overhead when running a Java 9 VM compared to Java 8. > > Alternatively, it would of course even be better if one could take a > snapshot of only the top x frames to walk on this object rather then a > throwable. > > I have added my benchmarks (snapshot for the current user thread operation, > complete for the entire processing) into this Gist: > https://gist.github.com/raphw/96e7c81d7c719cf7991b361bb7266c70 > > Thank you for any feedback on my finding! > > Best regards, Rafael
