chibenwa commented on pull request #886:
URL: https://github.com/apache/james-project/pull/886#issuecomment-1058845783
Hello,
Here is a status of our ongoing testing work. We encounter several issues
testing the Netty 4 migration on our pre-production environment. (@Arsnael is
involved on it too)
We did not yet reach the point where we could actually play our performance
tests.
# Issue 1: Partially written SELECT QRESYNC responses
When using evolution MUA (evolution in debug mode logs the full IMAP
exchange which is convenient) and QRESYNC is enabled, for one mailbox of the
account an error is encountered. Evolution complains the response was
truncated...
Here is the exchange:
```
[imapx:I] I/O: 'I00104 SELECT Spam (QRESYNC (203407991 88 2:37 (1,10,28
2,11,29)))'
[imapx:I] I/O: '* FLAGS (\Answered \Deleted \Draft \Flagged \Seen Junk
NonJunk $Forwarded)'
[imapx:I] I/O: '* 36 EXISTS'
[imapx:I] I/O: '* 0 RECENT'
[imapx:I] I/O: '* OK [UIDVALIDITY 203407991] UIDs valid'
[imapx:I] I/O: '* OK [UNSEEN 1] MailboxMessage 1 is first unseen'
[imapx:I] I/O: '* OK [PERMANENTFLAGS (\Answered \Deleted \Draft \Flagged
\Seen Junk NonJunk $Forwarded \*)] Limited'
[imapx:I] I/O: '* OK [HIGHESTMODSEQ 124] Highest'
[imapx:I] I/O: '* OK [UIDNEXT 38] Predicted next UID'
[imapx:I] I/O: '* VANISHED (EARLIER) '
[imapx:I] I/O: ''
```
You can see there is no 'OK' response.
I try to reproduce locally but working with QRESYNC is horrible. We still
need to conduct regression tests to know if this happens also on Netty 3...
## ISSUE 2: Thunderbird & IDLE not valid in this state
To reproduce, we open Thunderbird and switch mailboxes quickly. After a few
quick mailbox switches, thunderbird complains the mailbox can't be opened and
says `server answered: IDLE command invalid in this state`. All subsequent IMAP
requests fails and Thunderbird need to be restarted to start in a clean state.
It seems as if James logout the session without closing the channel. It's
unclear why this happens...
`The current operation in `inbox` did not succeed. The mail server for
account name [user_mail] responded: IDLE failed. Command not valid in this
state.`
Environment: distributed James in a cloud setup (k8s hosted at OVH). We
don't reproduce locally.
We did not yet manage to get a traffic capture.
I suspect concurrency issues: one IMAP request could be processed before the
previous one thus don't benefit from state changes of previous commands? To be
honnset this is still unclear to me... I did spend a few hour reproducing by
writing unit tests sending multiple commands at once but failed reproducing...
I have a blind bet about this one: this changeset added the @Shareable
annotation on a couple of handlers including the Imap handler, allowing
concurrency on this handler might lead to incorrect handling in the context of
a connected, stateful protocol. We need still to try such a change out...
## ISSUE 3: Gatling list
Our performance tests lists mailboxes, append a few message, selects a
mailbox, then fetches a few messages; however the listing always fails -
gatling cannot find an INBOX. We reliably reproduce running gatling & james
locally.
```
================================================================================
2022-03-01 04:53:35 505s elapsed
---- Requests
------------------------------------------------------------------
> Global (OK=26316
KO=8768 )
> Connect (OK=2516 KO=0
)
> login (OK=2516 KO=0
)
> heavyScenario / append (OK=3761 KO=0
)
> lightScenario / list (OK=0
KO=1252 )
> heavyScenario / list (OK=0
KO=7516 )
> lightScenario / select (OK=1251 KO=0
)
> heavyScenario / select (OK=7514 KO=0
)
> heavyScenario / fetch (OK=7508 KO=0
)
> lightScenario / fetch (OK=1250 KO=0
)
---- Errors
--------------------------------------------------------------------
> Unable to find folder 'INBOX' in 8768
(100.0%)
---- ImapPlatformValidation
----------------------------------------------------
[--------------------------------------------------------------- ]
0%
waiting: 475 / active: 2525 / done: 0
================================================================================
```
CF
https://github.com/linagora/james-gatling/blob/master/src/it/scala-2.12/org/apache/james/gatling/imap/PlatformValidationIT.scala
Using telnet, it seems we are getting "out of order" responses on the wire:
```
A1 list "" "*"
A1 OK LIST completed.
* LIST (\HasNoChildren) "." "INBOX"
* LIST (\HasNoChildren) "." "Sent"
* LIST (\HasNoChildren) "." "Trash"
```
Obviously we should be having the OK comming after the untagged responses...
Would that mean we need to "await" or "chain" channel futures when we write
responses to ensure a correct order?
## Issue 4: Blocking on netty IO event loop (Suspission)
I suspect we currently run everything straight on the io event loop (woken
up when channels receive reads/writes). I suspect we might have no choices but
to run handlers performing DB query operations (but likely also synchronous
channel read/writes) on a separate thread pool as it was done before. Local
benchmarks conducted so far would fail at detecting the impact of "blocking on
the event loop" as the concurrency level is low (less than 8 concurrent
connections). I wonder how the current set up will behave with high concurrency
levels (hundreds of concurrent connections).
## Actions
- [ ] Evaluate if the QRESYNC issue is a regression compared to Netty 3
- [ ] Unit tests reproducing the QRESYNC issue (hard)
- [ ] Capture of what James thinks he sends for QRESYNC issue ?
- [ ] Unit test reproducing the LIST ordering issue
- [ ] Try if awaits after writes solves the LIST response ordering issue
- [ ] Try the @Shareable change for IMAP handler and see if it have impacts
on the IDLE TB issue
Apparently, this changeset might keep us busy for still quite some time...
Regards,
Benoit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]