Hi Stefan, Thanks for the additional info. I’m happy to try a yocto build here.
Best Jan — > On 2. Mar 2023, at 12:24, Stefan Kral <stefan.k...@emlix.com> wrote: > > Hi, > > I can give you some background context: our CouchDB instance is running > on a embedded device (with minimal attack vector, so we have no pressure > to mitigate CVEs). CouchDB has been chosen because of its write append > and power fail safe property (and because of the easy scriptable > curl/json interface). > > Currently there is a production system running on a SMB1 share (mounted > in a Linux host) which works well (at least for our uses cases). SMB1 is > not logner the default on the Windows remote side. And SMB2/3 has an > issue with opening a renamend but not closed filedescriptor. The > question is, wether we can solve this issue with minimal changes. > >> 1. How did you verify that the gen_server:call/3 call never returns? >> 2. Do you get any pertinent lines (especially crashes) in your >> couch.log? > > by adding: > >> + ?LOG_DEBUG("before gen_server:call", []), >> ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity), >> + ?LOG_DEBUG("after gen_server:call", []), > > the log gives: > >> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process >> spawned for db "asdf" >> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for >> <0.391.0>: [{changes_done,1}, >> {database,<<"asdf">>}, >> {progress,100}, >> {started_on,1677753384}, >> {total_changes,1}, >> {type,database_compaction}, >> {updated_on,1677753384}] >> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files >> .../asdf.couch and .../asdf.couch.compact. >> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call > > then long time nothing... > > refreshing the db in the futon web gui gives: no response > > and the log continues with: > >> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server >> couch_compaction_daemon terminating >> ** Last message in was {'EXIT',<0.145.0>, >> {timeout, >> {gen_server,call,[couch_server,get_server]}}} >> ** When Server state == {state,<0.145.0>} >> ** Reason for termination == >> ** {compaction_loop_died, >> {timeout,{gen_server,call,[couch_server,get_server]}}} >> >> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>, >> {<0.144.0>,crash_report, >> [[{initial_call, >> {couch_compaction_daemon,init,['Argument__1']}}, >> {pid,<0.144.0>}, >> {registered_name,couch_compaction_daemon}, >> {error_info, >> {exit, >> {compaction_loop_died, >> {timeout, >> {gen_server,call,[couch_server,get_server]}}}, >> [{gen_server,terminate,7, >> [{file,"gen_server.erl"},{line,804}]}, >> {proc_lib,init_p_do_apply,3, >> [{file,"proc_lib.erl"},{line,237}]}]}}, > ... > > >> 3. Can you share your environment where you get to compile 1.6.1 >> successfully, so we can try and reproduce this? > > I could prepare you a yocto setup to build a toolchain and packages for > an qemu/docker imgage, if you are familar with that build system... > >> 4. Could it be that your SMB implementation doesn’t allow for opening >> and closing files in this quick succession (with our without a rename >> in the mix)? > > For testing it desn't need to run on SMB share, the timeout issue > occures with the given fd-swap patch on a default (Linux) setup. > > And a strace log does not show any underlying FS issues. > > > Best, > Stefan > > Am 28.02.23 um 16:47 schrieb Jan Lehnardt: >> first off, CouchDB 1.6.1 is no longer supported by this project AND it >> has a long list of CVEs[1] against it. You REALLY should be operating >> on a newer version. >> >> Secondly, just to understand your motivation: you think closing and >> opening the fds after the file:rename/2 call will make things work >> for your SMB operation? >> >> If yes, the only think I could spot that is substantially different, is >> that the NewFd position is advanced implicitly by the underlying >> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment, >> but I don’t know why that should block the gen server call, as that only >> does some refcounting updates[3]. While this includes stopping the >> gen_server[4], I don’t see how the Pid this operates on should be any >> different under your patch. >> >> So: >> >> 1. How did you verify that the gen_server:call/3 call never returns? >> 2. Do you get any pertinent lines (especially crashes) in your couch.log? >> 3. Can you share your environment where you get to compile 1.6.1 >> successfully, so we can try and reproduce this? >> 4. Could it be that your SMB implementation doesn’t allow for opening and >> closing files in this quick succession (with our without a rename in >> the mix)? >> >> >> [1]: https://docs.couchdb.org/en/stable/cve/index.html >> [2]: >> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179 >> [3]: >> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130 >> [4]: >> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84 >> >> >> Best >> Jan >> — >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >> >> 24/7 Observation for your CouchDB Instances: >> https://opservatory.app >> >> >>> On 28. Feb 2023, at 10:19, Stefan Kral <stefan.k...@emlix.com> wrote: >>> >>> Hi, >>> >>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this >>> is not supported, but I ran into a (maybe simple) problem I don't >>> understand. Maybe someone of you can give a hint easily (that would be >>> amazing). >>> >>> Given the following patch (I need to close/reopen the file descriptors >>> after renaming) for the function >>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176 >>> >>>> 1 --- a/src/couchdb/couch_db_updater.erl >>>> 2 +++ b/src/couchdb/couch_db_updater.erl >>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, >>>> #db{filepath=Path}=Db) -> >>>> 4 RootDir = couch_config:get("couchdb", "database_dir", "."), >>>> 5 couch_file:delete(RootDir, Filepath), >>>> 6 ok = file:rename(CompactFilepath, Filepath), >>>> 7 + >>>> 8 + ok = couch_file:close(NewDb#db.updater_fd), >>>> 9 + ok = couch_file:close(NewDb#db.fd), >>>> 10 + {ok, SwappedFd} = couch_file:open(Filepath), >>>> 11 + SwappedReaderFd = open_reader_fd(Filepath, Db#db.options), >>>> 12 + SwappedDb = NewDb2#db{ >>>> 13 + fd = SwappedReaderFd, >>>> 14 + updater_fd = SwappedFd >>>> 15 + }, >>>> 16 + unlink(SwappedFd), >>>> 17 close_db(Db), >>>> 18 - NewDb3 = refresh_validate_doc_funs(NewDb2), >>>> 19 + NewDb3 = refresh_validate_doc_funs(SwappedDb), >>>> 20 ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, >>>> infinity), >>>> 21 couch_db_update_notifier:notify({compacted, NewDb3#db.name}), >>>> 22 ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]), >>> >>> then the gen_server:call() of line 20 never returns. >>> >>> Is there a major issue with this approach or just a minor mistake in my >>> implementation? >>> >>> >>> Thank you for having a look, >>> Stefan >> >>