Hi,

We are observing a crash of OSD whenever we run FIO Read's from a client. Setup 
is very simple and explained as below:

1. One OSD with Ceph Version "ceph version 9.1.0-420-ge3921a8 
(e3921a8396870be4a38ce1f1b6c35bc0829dbb68)", pulled the code from GIT and 
compiled/Installed.
2. One Client with same version of CEPH.
3. FIO Version - fio-2.2.10-16-gd223
4. Ceph Conf as given below
5. Crash log details from log file as below
6. FIO Script as given below

CEPH Conf -

[global]
fsid = 9eda02e2-04b7-4eed-a85a-8471ea51528d
mon_initial_members = msl-dsma-spoc08
mon_host = 10.10.10.190
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
auth_supported = none

#Needed for Newstore 
osd_objectstore = newstore
enable experimental unrecoverable data corrupting features = newstore, rocksdb
newstore_backend = rocksdb

#Debug - Start Removed for  now to debug
#newstore_max_dir_size = 4096
#newstore_sync_io = true
#newstore_sync_transaction = true
#newstore_sync_submit_transaction = true
#newstore_sync_wal_apply = true
#newstore_overlay_max = 0
#Debug - End

#Needed for Newstore

filestore_xattr_use_omap = true

osd pool default size = 1
rbd cache = false


debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_threads = 5
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 25
#osd_op_num_sharded_pool_threads = 25
filestore_op_threads = 4
ms_nocrc = true
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
cephx sign messages = false
cephx require signatures = false
ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false 

Crash details from the log:
  -194> 2015-10-28 10:54:40.792957 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba915510
  -193> 2015-10-28 10:54:40.792959 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba916590
  -192> 2015-10-28 10:54:40.792962 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba914990
  -191> 2015-10-28 10:54:40.792965 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba916490
  -190> 2015-10-28 10:54:40.792968 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba916090
  -189> 2015-10-28 10:54:40.792971 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba915c10
  -188> 2015-10-28 10:54:40.792975 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba917190
  -187> 2015-10-28 10:54:40.792977 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba916810
  -186> 2015-10-28 10:54:40.792980 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba914790
  -185> 2015-10-28 10:54:40.792983 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba915e10
  -184> 2015-10-28 10:54:40.792986 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba915f10
  -183> 2015-10-28 10:54:40.792988 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba915f90
 -182> 2015-10-28 10:54:40.792992 7f15862e8700  2 
newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 
0x7f15ba914510
 ...
 
   -10> 2015-10-28 10:55:45.240480 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -9> 2015-10-28 10:55:45.240830 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -8> 2015-10-28 10:55:45.241135 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -7> 2015-10-28 10:55:45.241418 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -6> 2015-10-28 10:55:45.241674 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -5> 2015-10-28 10:55:45.241913 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -4> 2015-10-28 10:55:45.242150 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -3> 2015-10-28 10:55:45.242391 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -2> 2015-10-28 10:55:45.242614 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
    -1> 2015-10-28 10:55:45.242885 7f1577366700  5 
newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 
osr(1.b1 0x7f15a025acf0)
     0> 2015-10-28 10:55:54.323685 7f156e354700 -1 *** Caught signal (Aborted) 
**
in thread 7f156e354700

ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68)
1: (()+0x80b70a) [0x7f159c91670a]
2: (()+0x10340) [0x7f159afef340]
3: (gsignal()+0x39) [0x7f1599116cc9]
4: (abort()+0x148) [0x7f159911a0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1599a21535]
6: (()+0x5e6d6) [0x7f1599a1f6d6]
7: (()+0x5e703) [0x7f1599a1f703]
8: (()+0x5e922) [0x7f1599a1f922]
9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char*)+0xa5) 
[0x7f159ca12955]
10: (void decode<unsigned long, unsigned long>(std::map<unsigned long, unsigned 
long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, 
unsigned long> > >&, ceph::buffer::list::iterator&)+0x2e) [0x7f159c600fae]
11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, 
std::allocator<OSDOp> >&)+0x2aa4) [0x7f159c5b0314]
12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x97) 
[0x7f159c5cdf77]
13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x890) [0x7f159c5cedb0]
14: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x3552) [0x7f159c5d3732]
15: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, 
ThreadPool::TPHandle&)+0x705) [0x7f159c56d835]
16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, 
ThreadPool::TPHandle&)+0x3bd) [0x7f159c45d02d]
17: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) 
[0x7f159c45d24d]
18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8a9) 
[0x7f159c481649]
19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) 
[0x7f159c9f5caf]
20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f159c9f7bb0]
21: (()+0x8182) [0x7f159afe7182]
22: (clone()+0x6d) [0x7f15991da47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 0 lockdep
   0/ 0 context
   0/ 0 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 0 buffer
   0/ 0 timer
   0/ 0 filer
   0/ 1 striper
   0/ 0 objecter
   0/ 0 rados
   0/ 0 rbd
   0/ 5 rbd_replay
   0/ 0 journaler
   0/ 5 objectcacher
   0/ 0 client
   0/ 0 osd
   0/ 0 optracker
   0/ 0 objclass
   0/ 0 filestore
   1/ 3 keyvaluestore
   0/ 0 journal
   0/ 0 ms
   0/ 0 mon
   0/ 0 monc
   0/ 0 paxos
   0/ 0 tp
   0/ 0 auth
   1/ 5 crypto
   0/ 0 finisher
   0/ 0 heartbeatmap
   0/ 0 perfcounter
   0/ 0 rgw
   1/10 civetweb
   1/ 5 javaclient
   0/ 0 asok
   0/ 0 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.0.log
--- end dump of recent events ---

FIO Script - 

######################################################################
# Example test for the RBD engine.
# 
# Runs a 4k random write test against a RBD via librbd
#
# NOTE: Make sure you have either a RBD named 'fio_test' or change
#       the rbdname parameter.
######################################################################
[global]
#logging
#write_iops_log=write_iops_log
#write_bw_log=write_bw_log
#write_lat_log=write_lat_log
#ioengine=libaio
ioengine=rbd
clientname=admin
direct=1
pool=pool1
rbdname=im1
rw=randread or randwrite
bs=8k
numjobs=16
time_based=1
runtime=300
ramp_time=60

[rbd_iodepth32]
iodepth=128

Any pointers as to what could be the issue will be greatly appreciated.

Thanks,
-Vish
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to