Good news! :) the pictures didn't leave any doubt: this dust jam was a source of problems! and nothing at all, i just point you some facts, but you where the one who follow the tail and got his hands dirty, and solved it! :)
Regards, Dani El lun, 14-12-2015 a las 20:50 -0600, Jorge Araya Navarro escribió: > This was indeed a heat issue. What I did was to buy an air can and open the > machine and clean the > stuff, it was filled with dust: > > - https://www.instagram.com/p/_FmJajny4a/?taken-by=jorgejavieran > - https://www.instagram.com/p/_Fkln5ny2A/?taken-by=jorgejavieran > > And now my beloved laptop is working as new! :D. Thank you very much for the > support, Daniel! > > El lunes 14 de diciembre del 2015 a las 0357 horas, Daniel Tarrero escribió: > > > good morning out there! > > > > hmmm, so in the end this seem a heat problem! > > overheat can cause a lot of different failures, but wifi/radio and cpu > > related are very common. > > > > !!! In this scenario, you should use the ath9k module parameter: > > "nohwcrypt=0" (i think it's by default, but you can be sure reading the > > output of "$ modinfo ath9k"). > > This will make the wifi chip to handle the encryption, and this will > > cause less heat in general than usign the cpu. > > > > Well, overheat is good and bad =) it takes a few hours to replace a fan, > > but it wont cost you more than 20$ > > > > can you check that the fans are working? in which condition? > > take a look at some packages called "sensors", like "lm-sensors"; in my > > system it gives me some component temperatures and fan revolutions per > > min messeaurment > > > > You can always dissasembly the laptop and see how fans are rolling, but > > the RPM and Cº will be more accurate :) > > > > Also you can do a "field test", moving the laptop to some other cooler > > place (maybe to the bathroom, basement or kitchen). > > > > > > good luck! > > D > > > > > > > > El mar, 08-12-2015 a las 22:32 -0600, Jorge Araya Navarro escribió: > >> Hello, again! > >> > >> bad news, after making the changes you suggested (kernel options, module > >> options), the issue still's > >> coming back :(. Happens every time the machine gets hot, and I don't have > >> any way to improve the > >> temperature of my environment :-/ > >> > >> Maybe the fan needs some tweaking? I'm using laptop-tools btw. > >> > >> reloading the modules fixes 50% of the issue, however, I don't have my > >> wifi back... I expected that > >> unloading and reloading the modules would work and give me back the wifi > >> so I wouldn't need to > >> reboot the laptop. Here is some output after unloading and reloading the > >> modules: > >> > >> --8<---------------cut here---------------start------------->8--- > >> [ +0,070512] ath9k: ath9k: Driver unloaded > >> [ +0,446462] cfg80211: Calling CRDA to update world regulatory domain > >> [ +0,035610] ath9k 0000:02:00.0: enabling device (0000 -> 0002) > >> [ +0,000206] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this > >> driver > >> [ +0,000005] ath: phy0: Unable to initialize hardware; initialization > >> status: -95 > >> [ +0,000004] ath9k 0000:02:00.0: Failed to initialize device > >> [ +0,000077] ath9k: probe of 0000:02:00.0 failed with error -95 > >> [ +3,117031] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,150005] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,160005] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,149996] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,150000] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,150009] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,159986] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,150013] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,159995] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,149998] cfg80211: Calling CRDA to update world regulatory domain > >> [ +3,159987] cfg80211: Exceeded CRDA call max attempts. Not calling CRDA > >> [dic 8 21:48] e1000e: enp1s0 NIC Link is Down > >> [ +30,020843] e1000e: enp1s0 NIC Link is Down > >> [dic 8 21:49] e1000e: enp1s0 NIC Link is Down > >> --8<---------------cut here---------------end--------------->8--- > >> > >> Since I'm documenting this issue in a org-mode file, this is how I unload > >> and reload the modules, > >> with a org-mode source block!: > >> > >> --8<---------------cut here---------------start------------->8--- > >> #+BEGIN_SRC sh :results silent :export both :dir /sudo:: > >> modprobe -rf led_class > >> modprobe -rf cfg80211 > >> modprobe -rf mac80211 > >> modprobe -rf ath9k_hw > >> modprobe -rf ath9k_common > >> modprobe -rf ath9k > >> modprobe ath9k debug=1 > >> modprobe ath9k_common > >> modprobe ath9k_hw > >> modprobe mac80211 > >> modprobe cfg80211 > >> modprobe led_class > >> #+END_SRC > >> --8<---------------cut here---------------end--------------->8--- > >> > >> Talking about heat, I found this at the end of `dmesg`: > >> > >> --8<---------------cut here---------------start------------->8--- > >> [dic 8 22:16] e1000e: enp1s0 NIC Link is Down > >> [ +4,030516] CPU1: Core temperature above threshold, cpu clock throttled > >> (total events = 1) > >> [ +0,000779] CPU1: Core temperature/speed normal > >> [ +34,513236] mce: [Hardware Error]: Machine check events logged > >> [dic 8 22:18] e1000e: enp1s0 NIC Link is Down > >> [dic 8 22:20] e1000e: enp1s0 NIC Link is Down > >> --8<---------------cut here---------------end--------------->8--- > >> > >> Seems like my laptop is getting a little over heat indeed. > >> > >> El jueves 26 de noviembre del 2015 a las 0405 horas, Daniel Tarrero > >> escribió: > >> > >> > Good morning dudes! > >> > > >> > El jue, 26-11-2015 a las 00:41 -0600, Jorge Araya Navarro escribió: > >> >> Hope we can squash it! > >> >> --8<---------------cut here---------------start------------->8--- > >> >> $ lsmod | grep ath > >> >> ath9k 122880 0 > >> >> ath9k_common 28672 1 ath9k > >> >> ath9k_hw 438272 2 ath9k_common,ath9k > >> >> ath 24576 3 ath9k_common,ath9k,ath9k_hw > >> >> mac80211 565248 1 ath9k > >> >> cfg80211 409600 4 ath,ath9k_common,ath9k,mac80211 > >> >> led_class 16384 2 ath9k,thinkpad_acpi > >> >> --8<---------------cut here---------------end--------------->8--- > >> > > >> > What i see here is that you use the ath9k kernel module/driver. > >> > We also see that it's a rather complex module; other modules actually > >> > depend on it, like ath, ath9k_hw, ath9k_common, mac80211, cfg80211 and > >> > led_class. > >> > > >> > So... reload that is a pain in the ass ^^ > >> > > >> > $ modprobe -r led_class > >> > $ modprobe -r cfg80211 > >> > $ modprobe -r mac80211 > >> > $ modprobe -r ath9k_hw > >> > $ modprobe -r ath9k_common > >> > $ modprobe -r ath9k > >> > $ modprobe ath9k > >> > ... idem with: ath9k_common ath9k_hw mac80211 cfg80211 and led_class > >> > > >> > If module is busy, or something like that, you usually can force the > >> > module/driver to unload with "-f", for example: > >> > > >> > $ modprobe -rf ath9k > >> > > >> > It's important that you get the module unloaded and loaded again. > >> > Despite the benefit of not having to reboot when it crashes, you also > >> > will be able to pass "module parameters" to it, on the fly, when you > >> > reload > >> > > >> > Something like that: > >> > > >> > $ modprobe ath9k debug=1 > >> > > >> > we will find it usefull later, keep on it: > >> > > >> >> --8<---------------cut here---------------start------------->8--- > >> >> $ dmesg | grep firmware > >> >> [ +0,424592] psmouse serio2: trackpoint: IBM TrackPoint firmware: > >> >> 0x0e, buttons: 3/3 > >> >> --8<---------------cut here---------------end--------------->8--- > >> > > >> > No weird/propietary/bogus firmware being loaded for your atheros, good > >> > news :) > >> > > >> > Let's see the available module parameters: > >> > > >> >> --8<---------------cut here---------------start------------->8--- > >> >> $ modinfo ath9k > >> >> filename: > >> >> /lib/modules/4.1.13-gnu-1-lts/kernel/drivers/net/wireless/ath/ath9k/ath9k.ko.gz > >> >> license: Dual BSD/GPL > >> >> description: Support for Atheros 802.11n wireless LAN cards. > >> >> author: Atheros Communications > >> >> alias: (...) > >> >> depends: ath9k_hw,mac80211,ath9k_common,led-class,cfg80211,ath > >> >> intree: Y > >> >> vermagic: 4.1.13-gnu-1-lts SMP mod_unload modversions 686 > >> >> parm: debug:Debugging mask (uint) > >> >> parm: nohwcrypt:Disable hardware encryption (int) > >> >> parm: blink:Enable LED blink on activity (int) > >> >> parm: btcoex_enable:Enable wifi-BT coexistence (int) > >> >> parm: bt_ant_diversity:Enable WLAN/BT RX antenna diversity > >> >> (int) > >> >> parm: ps_enable:Enable WLAN PowerSave (int) > >> >> --8<---------------cut here---------------end--------------->8--- > >> > > >> > We see here those parameters: > >> > > >> > - "debug" > >> > overkill, probably dumps a lot of information to /var/log/syslog, but > >> > also probably we wont understand a shit. U can give it a try, but dont > >> > leave it enabled as it will consume a lot of resources. Intended for > >> > debuggin the module. > >> > > >> > - "nohwcrypt" > >> > disables hardware encryption, so it will be performed by CPU. If the > >> > encryption part of the chip is the buggy one, that can solve our problem > >> > with a little CPU cost) > >> > > >> > - "blink" > >> > disables wifi led, yujuu! > >> > > >> > - "btcoex_enable" > >> > enables bluetooth coexistence. Disabled by default, so nothing to > >> > scratch here. > >> > > >> > - "bt_ant_diversity" > >> > thats fun to give a try. Wifi/bluetooth cards are shipped with 1,2 or 3 > >> > antennas. So it can from share one antenna for all (bad idea) to use > >> > several antennas for one service (diversity, sounds good in order to > >> > improve signal and general performance, if we dont use bluetooth) > >> > > >> > - "ps_enable" > >> > enables powersave. If your computer is set up to hibernate / suspend, > >> > that can be a parameter to test. > >> > > >> > > >> > > >> > Here come two options: you can reload module (so you can test parameters > >> > on the fly), or not (so you have to reboot each time you want to change > >> > a parameter). > >> > > >> > If you get to reload the module, pass the module parameters in the > >> > command line, following the module name, with modprobe: > >> > > >> > $ modprobe ath9k nohwcrypt=1 > >> > > >> > If you can't reload it, the place you set it up to be catch on boot is > >> > in the files you'll find in "/etc/modprobe.d" directory. Just create a > >> > file there with content similar to these: > >> > > >> > options ath9k nohwcrypt=1 > >> > > >> > You can do it with one command line, like that: > >> > > >> > $ echo "options ath9k nohwcrypt=1" > /etc/modprobe.d/ath9k.conf > >> > > >> > ... and reboot it in order to apply the changes. > >> > > >> > > >> > I recommend you to try the "most conservative" parameters we've talk > >> > about: > >> > > >> > module parameters: > >> > nohwcrypt=1 > >> > btcoex_enable=0 > >> > bt_ant_diversity=1 > >> > ps_enable=0 > >> > > >> > With that, the load for the wifi hardware will be minimal: encryption > >> > will be performed by cpu, bluetooth will be disabled, and it's bluetooth > >> > antenna (if it have one) will be used for wifi. > >> > > >> >> --8<---------------cut here---------------start------------->8--- > >> >> $ egrep '(vmx|svm)' /proc/cpuinfo > >> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > >> >> mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx > >> >> constant_tsc bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm > >> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > >> >> mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx > >> >> constant_tsc bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm > >> >> --8<---------------cut here---------------end--------------->8--- > >> > > >> > Your cpu have Hardware Virtualization support (VMX). That has cause > >> > problems in the past with atheros modules. > >> > > >> > Try this kernel parameter on boot: > >> > > >> > intel_iommu=off > >> > > >> > > >> > > >> > With all that applied, i run out of ideas! Give it a try and let us know > >> > if that improves your system stability :) > >> > > >> > i go for a coffee truck =) > >> > regards, > >> > Dani > >> > > >> >> > >> >> > >> >> El miércoles 25 de noviembre del 2015 a las 0823 horas, Daniel Tarrero > >> >> escribió: > >> >> > >> >> > Hi again! > >> >> > > >> >> > sorry to hear that :( we have to keep putting the stick in the hole > >> >> > till > >> >> > the bug comes out :) > >> >> > > >> >> > i have a couple questions and a tweak worth to try: > >> >> > - which module/firmware do you use? the kernel's ath9k module? > >> >> > $ lsmod | grep ath > >> >> > $ dmesg | grep firmware > >> >> > > >> >> > - which options does this module support? > >> >> > $ modinfo ath9k > >> >> > > >> >> > - which processor do you have? has it virtualization supporT? > >> >> > $ egrep '(vmx|svm)' /proc/cpuinfo > >> >> > > >> >> > ---------- > >> >> > - virtualization tecnologies have cause this kind of conflicts in the > >> >> > past, so try this if you see output from the previous command: > >> >> > kernel boot parameter "intel_iommu=off" > >> >> > > >> >> > - atheros driver tweaks: we will see which module options can we > >> >> > adjust > >> >> > from the modinfo command :) > >> >> > > >> >> > > >> >> > luck and regards! > >> >> > D > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > El sáb, 21-11-2015 a las 15:19 -0600, Jorge Araya Navarro escribió: > >> >> >> Well, today the issue show his face again! :( You were right, the > >> >> >> kernel flag don't solve this > >> >> >> problem. However, after rebooting my laptop, the connection is > >> >> >> stable, I don't experience the > >> >> >> reconnection-every-60-seconds-phase anymore. > >> >> >> > >> >> >> I don't remember pasting the exact error message I get when the > >> >> >> issue appears, in any case, here it > >> >> >> is: > >> >> >> > >> >> >> --8<---------------cut here---------------start------------->8--- > >> >> >> [ +0,116708] ath: phy0: Chip reset failed > >> >> >> [ +0,000007] ath: phy0: Unable to reset channel, reset status -22 > >> >> >> [ +0,080357] ath: phy0: DMA failed to stop in 10 ms > >> >> >> AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff > >> >> >> [ +0,000016] ath: phy0: Could not stop RX, we could be confusing > >> >> >> the DMA engine when we start RX up > >> >> >> --8<---------------cut here---------------end--------------->8--- > >> >> >> > >> >> >> I was unable to reload the `ath` module, something start again > >> >> >> NetworkManager's service when I stop > >> >> >> it with `systemctl stop NetworkManager`, `systemctl > >> >> >> list-dependencies NetworkManager` shows many > >> >> >> services that I don't believe all of them depend on NetworkManager's > >> >> >> service. > >> >> >> > >> >> >> typing `sudo iwconfig wlp2s0 power off` doesn't work because that > >> >> >> feature isn't supported by my wifi > >> >> >> card. The sound works well except for some sound glitches, but that > >> >> >> happens because systemd-journal uses a > >> >> >> lot of CPU registering the never ending error message (the one > >> >> >> above). > >> >> >> > >> >> >> Here is the information you requested, hope this sheds some light > >> >> >> with this problem: > >> >> >> > >> >> >> --8<---------------cut here---------------start------------->8--- > >> >> >> $ sudo journalctl -b -1 | grep DMA > >> >> >> nov 21 12:26:57 abril.charola kernel: DMA [mem > >> >> >> 0x0000000000001000-0x0000000000ffffff] > >> >> >> nov 21 12:26:57 abril.charola kernel: DMA zone: 40 pages used for > >> >> >> memmap > >> >> >> nov 21 12:26:57 abril.charola kernel: DMA zone: 0 pages reserved > >> >> >> nov 21 12:26:57 abril.charola kernel: DMA zone: 3999 pages, LIFO > >> >> >> batch:0 > >> >> >> # [...] > >> >> >> nov 21 14:29:06 abril.charola kernel: ath: phy0: Failed to stop TX > >> >> >> DMA, queues=0x008! > >> >> >> nov 21 14:29:06 abril.charola kernel: ath: phy0: DMA failed to stop > >> >> >> in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff > >> >> >> nov 21 14:29:06 abril.charola kernel: ath: phy0: Could not stop RX, > >> >> >> we could be confusing the DMA engine when we start RX up > >> >> >> --8<---------------cut here---------------end--------------->8--- > >> >> >> > >> >> >> --8<---------------cut here---------------start------------->8--- > >> >> >> $ lspci | grep -e Ethernet -e Network > >> >> >> 01:00.0 Ethernet controller: Intel Corporation 82573L Gigabit > >> >> >> Ethernet Controller > >> >> >> 02:00.0 Network controller: Qualcomm Atheros AR9285 Wireless Network > >> >> >> Adapter (PCI-Express) (rev 01) > >> >> >> --8<---------------cut here---------------end--------------->8--- > >> >> >> > >> >> >> --8<---------------cut here---------------start------------->8--- > >> >> >> $ uname -a > >> >> >> Linux abril.charola 4.1.13-gnu-1-lts #1 SMP Sat Nov 14 09:15:27 UYT > >> >> >> 2015 i686 GNU/Linux > >> >> >> --8<---------------cut here---------------end--------------->8--- > >> >> >> > >> >> >> El lunes 16 de noviembre del 2015 a las 0340 horas, Daniel Tarrero > >> >> >> escribió: > >> >> >> > >> >> >> > Hi! > >> >> >> > > >> >> >> > This logs seem to me like an interrupt conflict, hardware failure, > >> >> >> > or > >> >> >> > unrecoverable state. > >> >> >> > > >> >> >> > I think that the kernel boot option "intremap" wont help you. > >> >> >> > > >> >> >> > Usually, remove and load again a module use to restablish it's > >> >> >> > functionallity (when succesfully performed). Of course, modules and > >> >> >> > kernel have a tree kind structure, so you have to unload its > >> >> >> > dependencies before unload a module. > >> >> >> > > >> >> >> > ----- > >> >> >> > Things you can give a try: > >> >> >> > > >> >> >> > * Look for any other interesting messages during boot: > >> >> >> > > >> >> >> > $ dmesg | more > >> >> >> > > >> >> >> > ... and more concrete, boot messages about DMA: > >> >> >> > > >> >> >> > $ dmesg | grep DMA | more > >> >> >> > > >> >> >> > * Disable "suspend" mode of the card (maybe it enters > >> >> >> > suspension-mode > >> >> >> > and never come back: not all cards support suspension): > >> >> >> > > >> >> >> > $ sudo iwconfig wlan0 power off > >> >> >> > > >> >> >> > * I also would try to _disable_sound_card_ in BIOS, and see if that > >> >> >> > makes a difference with your Wifi crashes. > >> >> >> > > >> >> >> > > >> >> >> > ---------- > >> >> >> > For more info: > >> >> >> > > >> >> >> > which wifi card you have? > >> >> >> > > >> >> >> > $ lspci > >> >> >> > $ lsusb > >> >> >> > > >> >> >> > which kernel you have? > >> >> >> > > >> >> >> > $ uname -a > >> >> >> > > >> >> >> > is this the propper list for that? > >> >> >> > > >> >> >> > probably not ^^ > >> >> >> > > >> >> >> > > >> >> >> > good morning dudes! > >> >> >> > Dani > >> >> >> > > >> >> >> > > >> >> >> > El vie, 13-11-2015 a las 12:47 -0600, Jorge Araya Navarro escribió: > >> >> >> >> Yo! lol. > >> >> >> >> > >> >> >> >> When this thing happens, I don't have anything playing sounds, so > >> >> >> >> I'm > >> >> >> >> not sure if the sound card gets affected. I wonder if setting that > >> >> >> >> kernel flag will prevent this issue from happening. I also too > >> >> >> >> wonder if > >> >> >> >> unloading and reloading the drivers will do something useful > >> >> >> >> regarding > >> >> >> >> my issue. > >> >> >> >> > >> >> >> >> I'm going to set the flag and came back here if something happens. > >> >> >> >> > >> >> >> >> El viernes 13 de noviembre del 2015 a las 0534 horas, Daniel > >> >> >> >> Tarrero escribió: > >> >> >> >> > >> >> >> >> > Que pasa Jorge!! > >> >> >> >> > > >> >> >> >> > The soon i talk about interrupts, the soon somebody faces > >> >> >> >> > problems using > >> >> >> >> > them!! maybe :) > >> >> >> >> > > >> >> >> >> > This seem to be a hardware communication problem. Did you read > >> >> >> >> > my last > >> >> >> >> > two mails? they may bring some information related to this > >> >> >> >> > problems. > >> >> >> >> > > >> >> >> >> >>> Did you see the DMAR mapping Warning too during boot?? that > >> >> >> >> >>> can have > >> >> >> >> > something to say here. The fact that a reboot use to solve it, > >> >> >> >> > makes me > >> >> >> >> > think it can be an interrupt conflict. > >> >> >> >> > > >> >> >> >> > Your logs say: "module/driver is sending commands to hardware, > >> >> >> >> > and it > >> >> >> >> > didnt respond as we expected" > >> >> >> >> > > >> >> >> >> > What can cause this? DMAR mess!!! and also hardware problems, > >> >> >> >> > like loose > >> >> >> >> > of power, changes in hardware that derives in interrupt > >> >> >> >> > conflicts like > >> >> >> >> > pluggin an e-sata, or faulty Atheros chip in the worse case. > >> >> >> >> > > >> >> >> >> > You can _force_ module unload (and also, you can > >> >> >> >> > _unload_dependent_modules_ first). Of course, you have to stop > >> >> >> >> > software > >> >> >> >> > using this hardware too. Maybe something like can make your day: > >> >> >> >> > > >> >> >> >> > $ sudo service network-manager stop (stop software) > >> >> >> >> > $ sudo ifconfig whatever down (unload network) > >> >> >> >> > $ modprobe -n ath (see dependent modules) > >> >> >> >> > $ sudo modprobe -f whatever (unload dependencies first) > >> >> >> >> > $ sudo modprobe -f ath (unload module) > >> >> >> >> > $ sudo modprobe ath (reload module) > >> >> >> >> > > >> >> >> >> > and test! > >> >> >> >> > You should give some time to the commands to complete, and keep > >> >> >> >> > an eye > >> >> >> >> > in syslog/dmesg to see resoults. > >> >> >> >> > > >> >> >> >> > Given that the problem flaps (come and go), i would also check > >> >> >> >> > power and > >> >> >> >> > heat (maybe replace charger with a travel one if you have, and > >> >> >> >> > place the > >> >> >> >> > laptop in a cold environment), and see if fault time changes. > >> >> >> >> > > >> >> >> >> > Also there is a previous warning with your sound card that can > >> >> >> >> > be > >> >> >> >> > related: > >> >> >> >> > snd_hda_intel 0000:00:1b.0: IRQ timing workaround is activated > >> >> >> >> > for card > >> >> >> >> > #0 > >> >> >> >> > > >> >> >> >> > Is your sound card working when this error happens? If not, we > >> >> >> >> > may have > >> >> >> >> > found the hardware interrupt conflict. They can be using the > >> >> >> >> > same > >> >> >> >> > interrupt, and when sound gets "tweaked" the wifi goes crazy > >> >> >> >> > about that > >> >> >> >> > delay in communications. > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > Good luck!! Im waiting for your experiences! :) > >> >> >> >> > > >> >> >> >> > Regards, > >> >> >> >> > D > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > El jue, 12-11-2015 a las 14:44 -0600, Jorge Araya Navarro > >> >> >> >> > escribió: > >> >> >> >> >> Hello! > >> >> >> >> >> > >> >> >> >> >> I bought my Libreboot T60 from Gluglug in December of last > >> >> >> >> >> year, and I'm very happy with a machine > >> >> >> >> >> which works with 100% Free Software! > >> >> >> >> >> > >> >> >> >> >> Since a couple of months ago is happening something strange to > >> >> >> >> >> my wifi card, I first thought the > >> >> >> >> >> issue was caused by a kernel update but I was wrong. What > >> >> >> >> >> happens is that at any random moment every > >> >> >> >> >> many or so weeks the wifi will drop the connection to never > >> >> >> >> >> re-establish it again, until reboot, and > >> >> >> >> >> after that sometimes the issue continues with the wifi card > >> >> >> >> >> dropping the connection once every 60 > >> >> >> >> >> seconds. > >> >> >> >> >> > >> >> >> >> >> Yesterday this thing happened again, so I decided to fire > >> >> >> >> >> Emacs and takes some notes and output with > >> >> >> >> >> org-mode. The first interesting thing is this from `dmesg`: > >> >> >> >> >> > >> >> >> >> >> --8<---------------cut > >> >> >> >> >> here---------------start------------->8--- > >> >> >> >> >> nov 12 12:43:43 abril.charola kernel: snd_hda_intel > >> >> >> >> >> 0000:00:1b.0: IRQ timing workaround is activated for card #0. > >> >> >> >> >> Suggest a bigger bdl_pos_adj. > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Failed to > >> >> >> >> >> stop TX DMA, queues=0x00a! > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:44 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Chip reset > >> >> >> >> >> failed > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Unable to > >> >> >> >> >> reset channel, reset status -22 > >> >> >> >> >> nov 12 12:43:45 abril.charola NetworkManager[445]: <warn> > >> >> >> >> >> Connection disconnected (reason -4) > >> >> >> >> >> nov 12 12:43:45 abril.charola NetworkManager[445]: <info> > >> >> >> >> >> (wlp2s0): supplicant interface state: completed -> disconnected > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: cfg80211: Exceeded CRDA > >> >> >> >> >> call max attempts. Not calling CRDA > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: DMA failed to > >> >> >> >> >> stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff > >> >> >> >> >> DMADBG_7=0xffffffff > >> >> >> >> >> nov 12 12:43:45 abril.charola kernel: ath: phy0: Could not > >> >> >> >> >> stop RX, we could be confusing the DMA engine when we start RX > >> >> >> >> >> up > >> >> >> >> >> nov 12 12:43:45 abril.charola NetworkManager[445]: <info> > >> >> >> >> >> (wlp2s0): supplicant interface state: disconnected -> scanning > >> >> >> >> >> --8<---------------cut > >> >> >> >> >> here---------------end--------------->8--- > >> >> >> >> >> > >> >> >> >> >> As I don't understand anything with such error messages, my > >> >> >> >> >> guess is that it is something > >> >> >> >> >> serious. after trying to unload the modules related to my wifi > >> >> >> >> >> driver (ath (which is impossible > >> >> >> >> >> because other modules requiring it are being use)) and typing > >> >> >> >> >> `ifconfig wlp2s0 down` and what not, I > >> >> >> >> >> just gave up and restart my laptop. At some point journald > >> >> >> >> >> register something interesting: > >> >> >> >> >> > >> >> >> >> >> --8<---------------cut > >> >> >> >> >> here---------------start------------->8--- > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: irq 17: nobody cared > >> >> >> >> >> (try booting with the "irqpoll" option) > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: CPU: 0 PID: 0 Comm: > >> >> >> >> >> swapper/0 Not tainted 4.1.11-gnu-1-lts #1 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: Hardware name: LENOVO > >> >> >> >> >> 1951F8G/1951F8G, BIOS CBET4000 79ETE7WW (2.27 ) 05/18/2015 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: c1609907 4a9301f9 > >> >> >> >> >> 00000000 f5035f54 c14a49ec f53d0e9c f5035f74 c10abbac > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: c1575cc0 00000011 > >> >> >> >> >> f5035f70 f85611db f53d0e40 00000000 f5035f98 c10abf22 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: c1329d4a 0003ab5e > >> >> >> >> >> 00000000 4a9301f9 f53d0e40 c1676e00 00000000 f5035fd4 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: Call Trace: > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c14a49ec>] > >> >> >> >> >> dump_stack+0x41/0x52 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10abbac>] > >> >> >> >> >> __report_bad_irq+0x2c/0xd0 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<f85611db>] ? > >> >> >> >> >> ath9k_hw_intrpend+0x5b/0x70 [ath9k_hw] > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10abf22>] > >> >> >> >> >> note_interrupt+0x212/0x250 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c1329d4a>] ? > >> >> >> >> >> add_interrupt_randomness+0x16a/0x1a0 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10a99a2>] > >> >> >> >> >> handle_irq_event_percpu+0x122/0x190 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10a99a2>] ? > >> >> >> >> >> handle_irq_event_percpu+0x122/0x190 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10a9a3a>] > >> >> >> >> >> handle_irq_event+0x2a/0x50 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10ac520>] ? > >> >> >> >> >> handle_edge_irq+0xe0/0xe0 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c10ac589>] > >> >> >> >> >> handle_fasteoi_irq+0x69/0x100 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c1004906>] > >> >> >> >> >> handle_irq+0x56/0x90 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: <IRQ> [<c14aa60c>] > >> >> >> >> >> do_IRQ+0x3c/0xd0 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c14a9c33>] > >> >> >> >> >> common_interrupt+0x33/0x38 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c138a553>] ? > >> >> >> >> >> cpuidle_enter_state+0x83/0x240 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c138a744>] > >> >> >> >> >> cpuidle_enter+0x14/0x20 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c108fe89>] > >> >> >> >> >> cpu_startup_entry+0x299/0x3a0 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c14a1f67>] > >> >> >> >> >> rest_init+0x67/0x70 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c167eb51>] > >> >> >> >> >> start_kernel+0x3c9/0x3e2 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<c167e2e3>] > >> >> >> >> >> i386_start_kernel+0x91/0x95 > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: handlers: > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<f81083c0>] usb_hcd_irq > >> >> >> >> >> [usbcore] > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: [<f860a890>] ath_isr > >> >> >> >> >> [ath9k] > >> >> >> >> >> nov 12 12:44:00 abril.charola kernel: Disabling IRQ #17 > >> >> >> >> >> --8<---------------cut > >> >> >> >> >> here---------------end--------------->8--- > >> >> >> >> >> > >> >> >> >> >> Again, I don't know what it says but seems very serious. I'll > >> >> >> >> >> attach the full logs in case what I > >> >> >> >> >> provide is not enough. Hope someone can help me with this. > >> >> >> >> >> > >> >> >> >> >> P.S.: I haven't clean my laptop from dust since I bought it, > >> >> >> >> >> and it seems it have some inside, this > >> >> >> >> >> sporadic issue can be caused by the dust, too. > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > >
