Follow up 6 months later: The same pattern happened again yesterday. The
error log blew up to 6 GB with the repeated PHP notice, and the production
server stopped responding.

This time I could see in the logs that what triggered the event was a heap
error, which caused BaseX to stop when it was in the middle of executing a
query sent by the PHP client. The value of socket_recv would have been 0
and read() would have returned NULL. The while loop in readString()
wouldn't terminate unless read() specifically returned an empty string, so
it stayed open, trying to read from the closed connection again and again
until the disk ran out of space and all server operations ceased.

I edited our copy of the PHP client files to check the result of
socket_recv and throw exceptions when it's 0 or false. I tested on a
development server by submitting a query I knew would take some time to
execute on our front end, then running basexserverstop before it could
complete. With no changes, the error log blew up to 20 MB within seconds.
With the changes, the process stopped as soon as BaseX did and returned the
proper exception message.

I submitted a PR to the PHP client repository. Though I don't know what
caused the Java heap error, and BaseX could unexpectedly stop again, our
server won't completely die if it does!

-Tamara



On Mon, Jan 30, 2023 at 10:58 PM Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Tamara,
>
> Sorry for the late reply. It’s mainly because I have no definitive
> solution on how to resolve the issue.
>
> My knowledge of PHP is limited, but I assume that the socket_recv
> function raises an error and returns false instead of the buffer size
> [1,2]. You could try to check the return value, maybe like this?
>
> if ($this->bsize == false) {
>   throw new BaseXException(socket_strerror(socket_last_error()));
> }
>
> The error message might give us a clue as to what goes wrong.
>
> Reading some more pages of the PHP socket documentation, it would
> probably be best to check the return type of all other socket
> functions as well, as all of them seem to return false instead of
> raising an exception.
>
> Hope this helps. If you decide to revise the client code, I invite you
> to provide us with the new version!
> Christian
>
> [1]
> https://github.com/BaseXdb/basex/blob/main/basex-api/src/main/php/BaseXClient/Session.php#L161
> [2] https://www.php.net/manual/de/function.socket-recv.php
>
>
>
> On Wed, Jan 25, 2023 at 6:58 PM Tamara Marnell
> <tmarn...@orbiscascade.org> wrote:
> >
> > Hello everyone,
> >
> > Once in a while, my server running BaseX will suddenly experience a big
> spike in write operations, the hard disk space will fill up to 100%, and
> the whole server will lock up. I can't say what initiates this, because the
> logs and metrics disprove every theory I can think of. (There's nothing
> unusual in the access logs or BaseX data logs, memory usage stays
> consistently at 45% and CPU usage at 5%, no user attempted anything like a
> huge bulk upload, etc.)
> >
> > However, one thing that always happens during these incidents is that
> the Apache error log blows up to a crazy size like 1 GB, filled with the
> message:
> >
> > PHP Notice:  Trying to access array offset on value of type null in
> /[path]/client/session.php on line 164
> >
> > This line is in the read method of the Session class from GitHub, and it
> indicates that the byte position passed to the buffer property is null. My
> error log probably fills up because read() is called by readString() in a
> while loop that never terminates. There might be extraordinary
> circumstances in which during construction, socket_connect returns "true"
> but the response is not valid or expected.
> >
> > -Tamara
> >
> > --
> >
> > Tamara Marnell
> > Program Manager, Systems
> > Orbis Cascade Alliance (orbiscascade.org)
> > Pronouns: she/her/hers
>


-- 

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org <https://www.orbiscascade.org/>)
Pronouns: she/her/hers

Reply via email to