Hi Stefan,

Thanks for the additional info. I’m happy to try a yocto build here.

Best
Jan
—

> On 2. Mar 2023, at 12:24, Stefan Kral <stefan.k...@emlix.com> wrote:
> 
> Hi,
> 
> I can give you some background context: our CouchDB instance is running
> on a embedded device (with minimal attack vector, so we have no pressure
> to mitigate CVEs). CouchDB has been chosen because of its write append
> and power fail safe property (and because of the easy scriptable
> curl/json interface).
> 
> Currently there is a production system running on a SMB1 share (mounted
> in a Linux host) which works well (at least for our uses cases). SMB1 is
> not logner the default on the Windows remote side. And SMB2/3 has an
> issue with opening a renamend but not closed filedescriptor. The
> question is, wether we can solve this issue with minimal changes.
> 
>> 1. How did you verify that the gen_server:call/3 call never returns?
>> 2. Do you get any pertinent lines (especially crashes) in your
>>   couch.log?
> 
> by adding:
> 
>> +        ?LOG_DEBUG("before gen_server:call", []),
>>         ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>> +        ?LOG_DEBUG("after gen_server:call", []),
> 
> the log gives:
> 
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process 
>> spawned for db "asdf"
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for 
>> <0.391.0>: [{changes_done,1},
>>                                                   {database,<<"asdf">>},
>>                                                   {progress,100},
>>                                                   {started_on,1677753384},
>>                                                   {total_changes,1},
>>                                                   {type,database_compaction},
>>                                                   {updated_on,1677753384}]
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files 
>> .../asdf.couch and .../asdf.couch.compact.
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
> 
> then long time nothing...
> 
> refreshing the db in the futon web gui gives: no response
> 
> and the log continues with:
> 
>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server 
>> couch_compaction_daemon terminating
>> ** Last message in was {'EXIT',<0.145.0>,
>>                           {timeout,
>>                               {gen_server,call,[couch_server,get_server]}}}
>> ** When Server state == {state,<0.145.0>}
>> ** Reason for termination ==
>> ** {compaction_loop_died,
>>       {timeout,{gen_server,call,[couch_server,get_server]}}}
>> 
>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>                     {<0.144.0>,crash_report,
>>                      [[{initial_call,
>>                         {couch_compaction_daemon,init,['Argument__1']}},
>>                        {pid,<0.144.0>},
>>                        {registered_name,couch_compaction_daemon},
>>                        {error_info,
>>                         {exit,
>>                          {compaction_loop_died,
>>                           {timeout,
>>                            {gen_server,call,[couch_server,get_server]}}},
>>                          [{gen_server,terminate,7,
>>                            [{file,"gen_server.erl"},{line,804}]},
>>                           {proc_lib,init_p_do_apply,3,
>>                            [{file,"proc_lib.erl"},{line,237}]}]}},
> ...
> 
> 
>> 3. Can you share your environment where you get to compile 1.6.1
>>   successfully, so we can try and reproduce this?
> 
> I could prepare you a yocto setup to build a toolchain and packages for
> an qemu/docker imgage, if you are familar with that build system...
> 
>> 4. Could it be that your SMB implementation doesn’t allow for opening
>> and closing files in this quick succession (with our without a rename
>> in the mix)?
> 
> For testing it desn't need to run on SMB share, the timeout issue
> occures with the given fd-swap patch on a default (Linux) setup.
> 
> And a strace log does not show any underlying FS issues.
> 
> 
> Best,
> Stefan
> 
> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>> has a long list of CVEs[1] against it. You REALLY should be operating
>> on a newer version.
>> 
>> Secondly, just to understand your motivation: you think closing and
>> opening the fds after the file:rename/2 call will make things work
>> for your SMB operation?
>> 
>> If yes, the only think I could spot that is substantially different, is
>> that the NewFd position is advanced implicitly by the underlying
>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>> but I don’t know why that should block the gen server call, as that only
>> does some refcounting updates[3]. While this includes stopping the
>> gen_server[4], I don’t see how the Pid this operates on should be any
>> different under your patch.
>> 
>> So:
>> 
>> 1. How did you verify that the gen_server:call/3 call never returns?
>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>> 3. Can you share your environment where you get to compile 1.6.1
>>   successfully, so we can try and reproduce this?
>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>   closing files in this quick succession (with our without a rename in
>>   the mix)?
>> 
>> 
>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>> [2]: 
>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>> [3]: 
>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>> [4]: 
>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>> 
>> 
>> Best
>> Jan
>> — 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>> 24/7 Observation for your CouchDB Instances:
>> https://opservatory.app
>> 
>> 
>>> On 28. Feb 2023, at 10:19, Stefan Kral <stefan.k...@emlix.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>> is not supported, but I ran into a (maybe simple) problem I don't
>>> understand. Maybe someone of you can give a hint easily (that would be
>>> amazing).
>>> 
>>> Given the following patch (I need to close/reopen the file descriptors
>>> after renaming) for the function
>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>> 
>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, 
>>>> #db{filepath=Path}=Db) ->
>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>> 5          couch_file:delete(RootDir, Filepath),
>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>> 7 +
>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>> 12 +        SwappedDb = NewDb2#db{
>>>> 13 +            fd = SwappedReaderFd,
>>>> 14 +            updater_fd = SwappedFd
>>>> 15 +        },
>>>> 16 +        unlink(SwappedFd),
>>>> 17          close_db(Db),
>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, 
>>>> infinity),
>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>> 
>>> then the gen_server:call() of line 20 never returns.
>>> 
>>> Is there a major issue with this approach or just a minor mistake in my
>>> implementation?
>>> 
>>> 
>>> Thank you for having a look,
>>> Stefan
>> 
>> 

Reply via email to