sramazzina commented on issue #3056: URL: https://github.com/apache/hop/issues/3056#issuecomment-1711390979
After months of tries looking for a reproduction path we definitely were able to get it. So let me try to write down something interesting to share with the community so that we can try to elaborate on this. First thing first. We were able to reproduce the problem by putting the system under light load (that means if we evaluate the 5min load average on linux system we are considering load > 2). We also already said that this happens only for pipelines where we have more than one hop that is entering a target transform. Therefore - because we started thinking the problem was related in how the `BaseTransform.handleGetRow()`, I rebuilt the hop-engine module by adding a bunch of log messages in points of the method that I considered critical. I though it ws a good idea to decorate every log message with a bunch of context informations to understand better what was happening at the time of the exception - we put together a very simple test pipeline  - we emulated the stress on the system by installing and running the _stress_ tool on linux (very useful `sudo apt install stress` to install it on ubuntu https://www.tecmint.com/linux-cpu-load-stress-test-with-stress-ng-tool/) - we built a very simple script in bash to launch the pipeline by using hop-run repeatedly - we collected the log As I said the issue randomly happens but after waiting for a while it happened. It is not clear yet were the problem is but it happened. I though it was a good idea start talking about that so I wanted to share everything till now. I attached to this comment the sample pipeline and the two logs related to a good and a failed execution. If someone want to help in investigating about that I'm very happy for that. YOu can also take a look at the changes made to `handleGetRow` to get the added log messages here https://github.com/sramazzina/hop/commit/f17fbf5fe33beb94b101f6978baf1d7695f65318 Remember that this issue and was inherited from Kettle because we incurred into this also at the time we were using it. I will be back soon with other comments, let me go back to work hard on solving this issue. [logfile-20230908-1058-ok.log](https://github.com/apache/hop/files/12557986/logfile-20230908-1058-ok.log) [logfile-20230908-1058-ko.log](https://github.com/apache/hop/files/12557987/logfile-20230908-1058-ko.log) [testNullPointer.hpl.zip](https://github.com/apache/hop/files/12558050/testNullPointer.hpl.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
