@procogent @jvegaseg 

Thank you for your reports.

This is probably happening because the attachment receiver parser was upgraded 
from sending closures [*] to sending messages. In that case, the error would 
happen during the upgrade when there is a mix of old (pre 2.2) nodes and new 
nodes in the same cluster. As soon as all nodes are upgraded, the errors should 
stop.

Specifically, 2.2 nodes in 
https://github.com/apache/couchdb/blob/56782453f342fb5e4137e8c9afc79b1992a8b21a/src/fabric/src/fabric.erl#L281
 will start sending messages instead of closures.  On the receiving side, 2.2 
nodes will know how to handle both closures and messages, however the old, 
pre-2.2 nodes, will not know how to handle messages and will show that error.

Perhaps try one of these scenarios to fix the issue:

1) Stop attachment uploading traffic (or all traffic) while nodes are 
upgrading, then resume

2) After a node is upgraded to 2.2, temporarily remove it from the load 
balancer list so that the new node it doesn't process HTTP API requests, but 
can still handle requests coordinated via older pre 2.2. nodes. Eventually 
there'd be only one node processing requests. Right before that node is 
restarted for upgrade, return traffic to all the other nodes. 

3) Build the release from source but revert this commit: 
https://github.com/apache/couchdb/commit/56782453f342fb5e4137e8c9afc79b1992a8b21a.
 Do a rolling node reboot. Then make another release with that commit and do 
another rolling node reboot.

[*] A closure is a snapshot of function's environment during the time when the 
function is created. In an Erlang cluster that environment can be even be sent 
to other nodes. However that is very fragile and needs the module to be exactly 
the same on all the cluster nodes.

[ Full content available at: https://github.com/apache/couchdb/issues/1578 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to