[
https://issues.apache.org/jira/browse/TS-4915?focusedWorklogId=30355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30355
]
ASF GitHub Bot logged work on TS-4915:
--------------------------------------
Author: ASF GitHub Bot
Created on: 11/Oct/16 09:27
Start Date: 11/Oct/16 09:27
Worklog Time Spent: 10m
Work Description: GitHub user shinrich opened a pull request:
https://github.com/apache/trafficserver/pull/1088
TS-4915: Crash from hostdb in PriorityQueueLess
These changes have been running on my production box since leaving work
Monday night. Will keep an eye on it. Lower traffic overnight might not be
stressing it sufficiently.
The main change was in PriorityQueueLess<>::erase. The assignment of the
end item to the erase point was not preserving the entry index. So the
assumption that entry->index is less than _v.length() was made invalid the next
time around. I think breaking this entry->index == _v index assignment can
also harm the bubble_sorting logic. I think PriorityQueueLess<>::pop also has
a problem, but my work load was not triggering that function, so I didn't dive
in there.
The other change was in RefCountCachePartition<C>::make_space_for. There
was an extra pop which I believe was doubly removing an entry already removed
in PriorityQueueLess::erase (called from RefCountCachePartition<C>::erase).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shinrich/trafficserver ts-4915-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafficserver/pull/1088.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1088
----
commit 0898a59bc33d63d18997a66437c808acd2e7e073
Author: Susan Hinrichs <[email protected]>
Date: 2016-10-11T09:20:11Z
TS-4915: Crash from hostdb in PriorityQueueLess
----
Issue Time Tracking
-------------------
Worklog Id: (was: 30355)
Time Spent: 10m
Remaining Estimate: 0h
> Crash from hostdb in PriorityQueueLess
> --------------------------------------
>
> Key: TS-4915
> URL: https://issues.apache.org/jira/browse/TS-4915
> Project: Traffic Server
> Issue Type: Bug
> Components: HostDB
> Reporter: Susan Hinrichs
> Priority: Blocker
> Fix For: 7.1.0
>
> Attachments: ts-4915.diff, ts-4915.diff
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Saw this while testing fix for TS-4813 with debug enabled.
> {code}
> (gdb) bt full
> #0 0x0000000000547bfe in RefCountCacheHashEntry::operator< (this=0x1cc0880,
> v2=...) at ../iocore/hostdb/P_RefCountCache.h:94
> No locals.
> #1 0x000000000054988d in
> PriorityQueueLess<RefCountCacheHashEntry*>::operator() (this=0x2b78a9a2587b,
> a=@0x2b78f402af68, b=@0x2b78f402aa28)
> at ../lib/ts/PriorityQueue.h:41
> No locals.
> #2 0x0000000000549785 in PriorityQueue<RefCountCacheHashEntry*,
> PriorityQueueLess<RefCountCacheHashEntry*> >::_bubble_up (this=0x1cb2990,
> index=2) at ../lib/ts/PriorityQueue.h:191
> comp = {<No data fields>}
> parent = 0
> #3 0x00000000006ecfcc in PriorityQueue<RefCountCacheHashEntry*,
> PriorityQueueLess<RefCountCacheHashEntry*> >::push (this=0x1cb2990,
> entry=0x2b78f402af60) at ../../lib/ts/PriorityQueue.h:91
> len = 2
> #4 0x00000000006ec206 in RefCountCachePartition<HostDBInfo>::put
> (this=0x1cb2900, key=6912554662447498853, item=0x2b78aee04f00, size=96,
> expire_time=1475202356) at ./P_RefCountCache.h:210
> expiry_entry = 0x2b78f402af60
> __func__ = "put"
> val = 0x1cc0880
> #5 0x00000000006eb3de in RefCountCache<HostDBInfo>::put (this=0x18051e0,
> key=6912554662447498853, item=0x2b78aee04f00, size=16,
> expiry_time=1475202356) at ./P_RefCountCache.h:462
> No locals.
> #6 0x00000000006e2d8e in HostDBContinuation::dnsEvent (this=0x2b7938020f00,
> event=600, e=0x2b78ac009440) at HostDB.cc:1422
> is_rr = false
> old_rr_data = 0x0
> first_record = 0x2b78ac0094f8
> m = 0x1
> failed = false
> old_r = {m_ptr = 0x0}
> af = 2 '\002'
> s_size = 16
> rrsize = 0
> allocSize = 16
> r = 0x2b78aee04f00
> old_info = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop
> = 0x7f3630}, m_refcount = 0}, iobuffer_index = 0,
> key = 47797242059264, app = {allotment = {application1 = 5326300,
> application2 = 0}, http_data = {http_version = 4,
> pipeline_max = 59, keepalive_timeout = 17, fail_count = 81,
> unused1 = 0, last_failure = 0}, rr = {offset = 5326300}}, data = {
> ip = {sa = {sa_family = 54488, sa_data =
> "^\000\000\000\000\000\020\034$\274x+\000"}, sin = {sin_family = 54488,
> sin_port = 94,
> sin_addr = {s_addr = 0}, sin_zero = "\020\034$\274x+\000"},
> sin6 = {sin6_family = 54488, sin6_port = 94, sin6_flowinfo = 0,
> sin6_addr = {__in6_u = {__u6_addr8 =
> "\020\034$\274x+\000\000\030\036$\274\375\b\000", __u6_addr16 = {7184, 48164,
> 11128,
> 0, 7704, 48164, 2301, 0}, __u6_addr32 = {3156483088,
> 11128, 3156483608, 2301}}}, sin6_scope_id = 3156478176}},
> hostname_offset = 6214872, srv = {srv_offset = 54488, srv_weight
> = 94, srv_priority = 0, srv_port = 0, key = 3156483088}},
> hostname_offset = 11128, ip_timestamp = 2845989456,
> ip_timeout_interval = 11128, is_srv = 0, reverse_dns = 0, round_robin = 1,
> round_robin_elt = 0}
> valid_records = 0
> tip = {_family = 2, _addr = {_ip4 = 540420056, _ip6 = {__in6_u =
> {__u6_addr8 = "\330'6 x+\000\000\360L\020\250x+\000",
> __u6_addr16 = {10200, 8246, 11128, 0, 19696, 43024, 11128,
> 0}, __u6_addr32 = {540420056, 11128, 2819640560, 11128}}},
> _byte = "\330'6 x+\000\000\360L\020\250x+\000", _u32 =
> {540420056, 11128, 2819640560, 11128}, _u64 = {47794936489944,
> 47797215710448}}}
> ttl_seconds = 132
> aname = 0x2b7938021000 "fbmm1.zenfs.com"
> offset = 96
> thread = 0x2b78a8101010
> __func__ = "dnsEvent"
> #7 0x00000000005145dc in Continuation::handleEvent (this=0x2b7938020f00,
> event=600, data=0x2b78ac009440)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #8 0x00000000006f681e in DNSEntry::postEvent (this=0x2b78f4028600) at
> DNS.cc:1269
> __func__ = "postEvent"
> #9 0x00000000005145dc in Continuation::handleEvent (this=0x2b78f4028600,
> event=1, data=0x2aac954db040)
> at ../iocore/eventsystem/I_Continuation.h:153
> No locals.
> #10 0x00000000007bc9be in EThread::process_event (this=0x2b78a8101010,
> e=0x2aac954db040, calling_code=1) at UnixEThread.cc:143
> c_temp = 0x2b78f4028600
> lock = {m = {m_ptr = 0x17dea10}, lock_acquired = true}
> __func__ = "process_event"
> #11 0x00000000007bcc2d in EThread::execute (this=0x2b78a8101010) at
> UnixEThread.cc:197
> done_one = false
> e = 0x2aac954db040
> NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x18ce400},
> tail = 0x18ce400}
> next_time = 1475191803711988905
> __func__ = "execute"
> #12 0x00000000007bbfd2 in spawn_thread_internal (a=0x17fb9a0) at Thread.cc:84
> p = 0x17fb9a0
> #13 0x00002b78a2555aa1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #14 0x00000032310e893d in clone () from /lib64/libc.so.6
> No symbol table info available.
> core == ET_NET 13 and core == ET_NET 20
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)