sramazzina commented on issue #3056:
URL: https://github.com/apache/hop/issues/3056#issuecomment-1711390979

   After months of tries looking for a reproduction path we definitely were 
able to get it. So let me try to write down something interesting to share with 
the community so that we can try to elaborate on this.
   
   First thing first. We were able to reproduce the problem by putting the 
system under light load (that means if we evaluate the 5min load average on 
linux system we are considering load > 2). We also already said that this 
happens only for pipelines where we have more than one hop that is entering a 
target transform.
   
   Therefore 
   
   - because we started thinking the problem was related in how the 
`BaseTransform.handleGetRow()`, I rebuilt the hop-engine module by adding a 
bunch of log messages in points of the method that I considered critical. I 
though it ws a good idea to decorate every log message with a bunch of context 
informations to understand better what was happening at the time of the 
exception
   - we put together a very simple test pipeline
   
   
![image](https://github.com/apache/hop/assets/1270945/83a64eba-7ed7-4b40-bcf7-c60ac626d59a)
   
   - we emulated the stress on the system by installing and running the 
_stress_ tool on linux (very useful `sudo apt install stress` to install it on 
ubuntu https://www.tecmint.com/linux-cpu-load-stress-test-with-stress-ng-tool/)
   - we built a very simple script in bash to launch the pipeline by using 
hop-run repeatedly
   - we collected the log
   
   As I said the issue randomly happens but after waiting for a while it 
happened. It is not clear yet were the problem is but it happened. I though it 
was a good idea start talking about that so I wanted to share everything till 
now. I attached to this comment the sample pipeline and the two logs related to 
a good and a failed execution. If someone want to help in investigating about 
that I'm very happy for that. YOu can also take a look at the changes made to 
`handleGetRow` to get the added log messages here 
https://github.com/sramazzina/hop/commit/f17fbf5fe33beb94b101f6978baf1d7695f65318
   
   Remember that this issue and was inherited from Kettle because we incurred 
into this also at the time we were using it.
   
   I will be back soon with other comments, let me go back to work hard on 
solving this issue.
   
   
[logfile-20230908-1058-ok.log](https://github.com/apache/hop/files/12557986/logfile-20230908-1058-ok.log)
   
   
[logfile-20230908-1058-ko.log](https://github.com/apache/hop/files/12557987/logfile-20230908-1058-ko.log)
   
   
[testNullPointer.hpl.zip](https://github.com/apache/hop/files/12558050/testNullPointer.hpl.zip)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to