sergey-safarov commented on issue #1119: Could not open shard
URL: https://github.com/apache/couchdb/issues/1119#issuecomment-362504677
 
 
   Joan (@wohali)
   First i measured disk io operation when `tmpfs` mounted as volume inside 
container.
   ```
   [root@node0 ~]# docker run --rm --read-only -it --mount 
type=tmpfs,destination=/opt/couchdb/data fedora:27 bash
   [root@c0ed791b0e7d /]# rm -f /workspace/* && time dd if=/dev/zero 
of=/opt/couchdb/data/test_container1.img bs=512 count=1000 oflag=dsync
   1000+0 records in
   1000+0 records out
   512000 bytes (512 kB, 500 KiB) copied, 0.00277375 s, 185 MB/s
   
   real 0m0.004s
   user 0m0.003s
   sys  0m0.001s
   [root@c0ed791b0e7d /]# rm -f /workspace/* && time dd if=/dev/zero 
of=/opt/couchdb/data/test_container1.img bs=512 count=1000 oflag=dsync
   1000+0 records in
   1000+0 records out
   512000 bytes (512 kB, 500 KiB) copied, 0.00277794 s, 184 MB/s
   
   real 0m0.004s
   user 0m0.001s
   sys  0m0.003s
   [root@c0ed791b0e7d /]# rm -f /workspace/* && time dd if=/dev/zero 
of=/opt/couchdb/data/test_container1.img bs=512 count=1000 oflag=dsync
   1000+0 records in
   1000+0 records out
   512000 bytes (512 kB, 500 KiB) copied, 0.00283199 s, 181 MB/s
   
   real 0m0.004s
   user 0m0.001s
   sys  0m0.003s
   [root@c0ed791b0e7d /]# rm -f /workspace/* && time dd if=/dev/zero 
of=/opt/couchdb/data/test_container1.img bs=512 count=1000 oflag=dsync
   1000+0 records in
   1000+0 records out
   512000 bytes (512 kB, 500 KiB) copied, 0.00279983 s, 183 MB/s
   
   real 0m0.004s
   user 0m0.001s
   sys  0m0.003s
   ```
   Then i started couchdb container with mounter `tmpfs` as volume.  Added arg 
`--mount type=tmpfs,destination=/opt/couchdb/data`
    ```
   docker run -t --rm=true --log-driver=none --network kazoo --name couchdb1 \
                    --hostname couchdb1.kazoo \
                    --ip 10.0.9.8 \
                    --ulimit nofile=999999 \
                    --mount type=tmpfs,destination=/opt/couchdb/data \
                    -v 
/etc/kazoo/couchdb/vm.args.node1:/opt/couchdb/etc/vm.args \
                    apache/couchdb:2.1.1
   ```
   `vm.args` is same to default with added option `name`
   ```
   [root@node0 ~]# docker exec -it couchdb1 cat /opt/couchdb/etc/vm.args
   # Licensed under the Apache License, Version 2.0 (the "License"); you may not
   # use this file except in compliance with the License. You may obtain a copy 
of
   # the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing, software
   # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
   # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
   # License for the specific language governing permissions and limitations 
under
   # the License.
   
   # Ensure that the Erlang VM listens on a known port
   -kernel inet_dist_listen_min 9100
   -kernel inet_dist_listen_max 9100
   
   # Tell kernel and SASL not to log anything
   -kernel error_logger silent
   -sasl sasl_error_logger false
   
   # Use kernel poll functionality if supported by emulator
   +K true
   
   # Start a pool of asynchronous IO threads
   +A 16
   
   # Comment this line out to enable the interactive Erlang shell on startup
   +Bd -noinput
   -setcookie monster
   -name couchdb@couchdb1.kazoo
   
   [root@node0 ~]# 
   ```
   Now disk layout inside container
   ```
   [root@node0 ~]# docker exec -it couchdb1 df
   Filesystem                  1K-blocks     Used Available Use% Mounted on
   overlay                     268304384 76159032 192145352  29% /
   tmpfs                           65536        0     65536   0% /dev
   tmpfs                        65968072        0  65968072   0% /sys/fs/cgroup
   /dev/mapper/vgRAID10-root     2086912   157692   1929220   8% /etc/couchdb
   /dev/mapper/vgRAID10-docker 268304384 76159032 192145352  29% /etc/hosts
   shm                             65536        0     65536   0% /dev/shm
   tmpfs                        65968072       48  65968024   1% 
/opt/couchdb/data
   tmpfs                        65968072        0  65968072   0% /proc/scsi
   tmpfs                        65968072        0  65968072   0% /sys/firmware
   ```
   Host total memory. Host have 128Gb of memory and used about 160 Mb of swap.
   ```
   top - 01:41:43 up 17 days, 15:46,  4 users,  load average: 1.78, 1.54, 0.87
   Tasks: 900 total,   1 running, 283 sleeping,   0 stopped, 486 zombie
   %Cpu(s):  0.8 us,  0.8 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 
st
   KiB Mem : 13193614+total, 12358560 free,  2934528 used, 11664305+buff/cache
   KiB Swap: 31457280+total, 31441177+free,   161036 used. 12743585+avail Mem 
   
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
                                                                                
                                              
    1332 root      20   0 1469936 452596   4084 S   6.3  0.3 587:28.21 beam.smp 
                                                                                
                                              
   14915 root      20   0 1297128 295484   3928 S   5.6  0.2 510:51.97 beam.smp 
                                                                                
                                              
   20715 root      20   0   54392   5120   3812 R   0.7  0.0   0:00.09 top
   ```
   Then i started replication and after 5 hours i have folowing results.
   Disk layout inside couchdb container. CoucDB data size about 10Gb
   ```
   [root@node0 ~]# docker exec -it couchdb1 df
   Filesystem                  1K-blocks     Used Available Use% Mounted on
   overlay                     268304384 76134184 192170200  29% /
   tmpfs                           65536        0     65536   0% /dev
   tmpfs                        65968072        0  65968072   0% /sys/fs/cgroup
   /dev/mapper/vgRAID10-root     2086912   157692   1929220   8% /etc/couchdb
   /dev/mapper/vgRAID10-docker 268304384 76134184 192170200  29% /etc/hosts
   shm                             65536        0     65536   0% /dev/shm
   tmpfs                        65968072 10083456  55884616  16% 
/opt/couchdb/data
   tmpfs                        65968072        0  65968072   0% /proc/scsi
   tmpfs                        65968072        0  65968072   0% /sys/firmware
   ```
   Host memory usage. Swap usage not changed. All CouchDB data is located 
inside memory.
   ```
   top - 06:35:51 up 17 days, 20:40,  6 users,  load average: 1.03, 0.95, 0.82
   Tasks: 938 total,   1 running, 322 sleeping,   0 stopped, 486 zombie
   %Cpu(s):  1.4 us,  1.1 sy,  0.0 ni, 97.2 id,  0.1 wa,  0.1 hi,  0.1 si,  0.0 
st
   KiB Mem : 13193614+total, 87901344 free,  3644528 used, 40390272 buff/cache
   KiB Swap: 31457280+total, 31441177+free,   161036 used. 11659586+avail Mem 
   
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
                                                                                
                                              
    5269 root      20   0   56704   5360   3876 R  11.8  0.0   0:00.03 top      
                                                                                
                                              
   14915 root      20   0 1315680 307456   5852 S  11.8  0.2 550:18.26 beam.smp 
                                                                                
                                              
    1332 root      20   0 1463408 452628   5444 S   5.9  0.3 615:53.69 beam.smp 
                                                                                
                                              
   23036 root      20   0 1859228  63316  31484 S   5.9  0.0  30:43.66 dockerd 
   ```
   After 5 hours of replication to database located on memory disk (`tmpfs`) 
couchdb have following errors in log
   **Could not open shards** - total 2 error
   ```
   2ea5274cd52c5bf0-201802/_local/75d3dc6ddfdfcae76e87a7acdfc5ed89 201 ok 42
   Feb 02 03:50:57 node0.docker.rcsnet.ru sh[28220]: [notice] 
2018-02-02T03:50:57.866252Z couchdb@couchdb1.kazoo <0.357.0> -------- 
couch_replicator_scheduler: Job 
{"75d3dc6ddfdfcae76e87a7acdfc5ed89","+create_target"} completed normally
   Feb 02 03:50:57 node0.docker.rcsnet.ru sh[28220]: [notice] 
2018-02-02T03:50:57.866613Z couchdb@couchdb1.kazoo <0.22131.14> 8a931c0f9d 
10.0.9.8:5984 10.0.9.8 undefined POST /_replicate 200 ok 2805
   Feb 02 03:50:57 node0.docker.rcsnet.ru sh[28220]: [notice] 
2018-02-02T03:50:57.916222Z couchdb@couchdb1.kazoo <0.357.0> -------- 
couch_replicator_scheduler: Job 
{"80f52a6352af0019a08ff5bf307e6192","+create_target"} started as <0.10587.13>
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.599271Z couchdb@couchdb1.kazoo <0.14885.14> 4c046843ba 
Request to create N=3 DB but only 1 node(s)
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [notice] 
2018-02-02T03:50:59.629560Z couchdb@couchdb1.kazoo <0.14885.14> 4c046843ba 
10.0.9.8:5984 10.0.9.8 undefined PUT 
/account%2f94%2f93%2f45d6587d8f195f1620b65b1eb063-201802/ 201 ok 32
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.655204Z couchdb@couchdb1.kazoo <0.2117.13> -------- Could 
not open file 
./data/shards/60000000-7fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459.couch:
 no such file or directory
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.655216Z couchdb@couchdb1.kazoo <0.2121.13> -------- Could 
not open file 
./data/shards/40000000-5fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459.couch:
 no such file or directory
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [info] 
2018-02-02T03:50:59.666163Z couchdb@couchdb1.kazoo <0.206.0> -------- 
open_result error {not_found,no_db_file} for 
shards/60000000-7fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [info] 
2018-02-02T03:50:59.666210Z couchdb@couchdb1.kazoo <0.206.0> -------- 
open_result error {not_found,no_db_file} for 
shards/40000000-5fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [warning] 
2018-02-02T03:50:59.666204Z couchdb@couchdb1.kazoo <0.2139.13> d9f2f7b03a 
creating missing database: 
shards/60000000-7fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [warning] 
2018-02-02T03:50:59.666235Z couchdb@couchdb1.kazoo <0.2149.13> d9f2f7b03a 
creating missing database: 
shards/40000000-5fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.666644Z couchdb@couchdb1.kazoo <0.303.0> -------- 
mem3_shards tried to create 
shards/40000000-5fffffff/account/94/93/45d6587d8f195f1620b65b1eb063-201802.1517543459,
 got file_exists
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.932281Z couchdb@couchdb1.kazoo <0.16410.7> -------- 
rexi_server: from: couchdb@couchdb1.kazoo(<0.16407.7>) mfa: 
fabric_rpc:all_docs/3 exit:timeout 
[{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,308}]},{couch_mrview,finish_fold,2,[{file,"src/couch_mrview.erl"},{line,642}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
   Feb 02 03:50:59 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T03:50:59.932290Z couchdb@couchdb1.kazoo <0.16409.7> -------- 
rexi_server: from: couchdb@couchdb1.kazoo(<0.16407.7>) mfa: 
fabric_rpc:all_docs/3 exit:timeout 
[{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,308}]},{couch_mrview,finish_fold,2,[{file,"src/couch_mrview.erl"},{line,642}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
   ```
   **Replicator error. Cannot put document** - about 20 errors
   ```
   Feb 02 04:04:51 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T04:04:51.673222Z couchdb@couchdb1.kazoo <0.8179.40> -------- 
rexi_server: from: couchdb@couchdb1.kazoo(<0.10464.36>) mfa: 
fabric_rpc:all_docs/3 exit:timeout 
[{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,308}]},{couch_mrview,finish_fold,2,[{file,"src/couch_mrview.erl"},{line,642}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T04:04:55.012851Z couchdb@couchdb1.kazoo emulator -------- Error in 
process <0.10651.59> on node 'couchdb@couchdb1.kazoo' with exit value: 
{{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,613}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,664}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},...
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]: [info] 
2018-02-02T04:04:55.013064Z couchdb@couchdb1.kazoo <0.353.0> -------- 
Replication connection to: "10.0.9.8":5984 died with reason 
{{nocatch,{mp_parser_died,noproc}},[{couch_att,'-foldl/4-fun-0-',3,[{file,"src/couch_att.erl"},{line,613}]},{couch_att,fold_streamed_data,4,[{file,"src/couch_att.erl"},{line,664}]},{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,617}]},{couch_httpd_multipart,atts_to_mp,4,[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]}
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T04:04:55.014366Z couchdb@couchdb1.kazoo <0.7570.55> 8967ff8316 
req_err(4199105376) badmatch : ok
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:     
[<<"chttpd_db:db_doc_req/3 L782">>,<<"chttpd:process_request/1 
L295">>,<<"chttpd:handle_request_int/1 L231">>,<<"mochiweb_http:headers/6 
L91">>,<<"proc_lib:init_p_do_apply/3 L237">>]
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]: [notice] 
2018-02-02T04:04:55.014881Z couchdb@couchdb1.kazoo <0.7570.55> 8967ff8316 
10.0.9.8:5984 10.0.9.8 undefined PUT 
/account%2f0e%2fe2%2fb3fccd72d86299cf8f66f6caa6bd-201801/201801-b089783f4734b57244ea8b5613f8375a?new_edits=false
 500 ok 2
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]: [error] 
2018-02-02T04:04:55.015761Z couchdb@couchdb1.kazoo <0.5748.60> -------- 
Replicator, request PUT to 
"http://10.0.9.8:5984/account%2f0e%2fe2%2fb3fccd72d86299cf8f66f6caa6bd-201801/201801-b089783f4734b57244ea8b5613f8375a?new_edits=false";
 failed due to error {error,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:     {'EXIT',
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:         
{{{nocatch,{mp_parser_died,noproc}},
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:           
[{couch_att,'-foldl/4-fun-0-',3,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:                
[{file,"src/couch_att.erl"},{line,613}]},
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:            
{couch_att,fold_streamed_data,4,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:                
[{file,"src/couch_att.erl"},{line,664}]},
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:            
{couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,617}]},
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:            
{couch_httpd_multipart,atts_to_mp,4,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:                
[{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:          {gen_server,call,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:              [<0.13476.55>,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:               {send_req,
   Feb 02 04:04:55 node0.docker.rcsnet.ru sh[28220]:                   {{url,
   ```
   About replication error is important server IP `10.0.9.8`. This IP is 
assigned to destination CouchDB server.
   
   Joan (@wohali), I think issue not related to disk IO speed inside container. 
Test is made on RAM disk (`tmpfs`) with speed about 180 MB/s. That value is 
more greater then on real hard disk.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to