Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 5:16 PM, wrote: > Matt Garman wrote: >> On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell >> wrote: > >> On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell >> wrote: >>> Well I spoke too soon. The importer (the one that was initially >>> hanging that I came here to fix) hung up after running 20 hours. There >>> were no NFS errors or messages on neither the client nor the server. >>> When I restarted it, it hung after 1 minute, Restarted it again and it >>> hung after 20 seconds. After that when I restarted it it hung >>> immediately. Still no NFS errors or messages. I tried running the >>> process on the server and it worked fine. So I have to believe this is >>> related to nobarrier. Tomorrow I will try removing that setting, but I >>> am no closer to solving this and I have to leave Japan Saturday :-( >>> >>> The bad disk still has not been replaced - that is supposed to happen >>> tomorrow, but I won't have enough time after that to draw any >>> conclusions. >> >> I've seen behavior like that with disks that are on their way out... > > I just had a truly unpleasant thought, speaking of disks. Years ago, we > tried some WD Green drives in our servers, and that was a disaster. In > somewhere between days and weeks, the drives would go offline. I finally > found out what happened: consumer-grade drives are intended for desktops, > and the TLER - how long the drive keeps trying to read or write to a > sector before giving up, marking the sector bad, and going somewhere else > - is two *minutes*. Our servers were expecting the TLER to be 7 *seconds* > or under. Any chance the client cheaped out with any of the drives? No, it's a fairly high end Lenovo X series server (X3650 I think). ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 4:23 PM, Matt Garman wrote: > On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell > wrote: >> This site is locked down like no other I have ever seen. You cannot >> bring anything into the site - no computers, no media, no phone. You >> ... >> This is my client's client, and even if I could circumvent their >> policy I would not do that. They have a zero tolerance policy and if >> ... > > OK, no internet for real. :) Sorry I kept pushing this. I made an > unflattering assumption that maybe it just hadn't occurred to you how > to get files in or out. Sometimes there are "soft" barriers to > bringing files in or out: they don't want it to be trivial, but want > it to be doable if necessary. But then there are times when they > really mean it. I thought maybe the former applied to you, but > clearly it's the latter. Apologies. > >> These are all good debugging techniques, and I have tried some of >> them, but I think the issue is load related. There are 50 external >> machines ftp-ing to the C7 server, 24/7, thousands of files a day. And >> on the C6 client the script that processes them is running >> continuously. It will sometimes run for 7 hours then hang, but it has >> run for as long as 3 days before hanging. I have never been able to >> reproduce the errors/hanging situation manually. > > If it truly is load related, I'd think you'd see something askew in > the sar logs. But if the load tends to spike, rather than be > continuous, the sar sampling rate may be too coarse to pick it up. > >> And again, this is only at this site. We have the same software >> deployed at 10 different sites all doing the same thing, and it all >> works fine at all of those. > > Flaky hardware can also cause weird intermittent issues. I know you > mentioned before your hardware is fairly new/decent spec; but that > doesn't make it immune to manufacturing defects. For example, imagine > one voltage regulator that's ever-so-slightly out of spec. It > happens. Bad memory is not uncommon and certainly causes all kinds of > mysterious issues (though in my experience that tends to result in > spontaneous reboots or hard lockups, but truly anything could happen). > > Ideally, you could take the system offline and run hardware > diagnostics, but I suspect that's impossible given your restrictions > on taking things in/out of the datacenter. > > On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell > wrote: >> Well I spoke too soon. The importer (the one that was initially >> hanging that I came here to fix) hung up after running 20 hours. There >> were no NFS errors or messages on neither the client nor the server. >> When I restarted it, it hung after 1 minute, Restarted it again and it >> hung after 20 seconds. After that when I restarted it it hung >> immediately. Still no NFS errors or messages. I tried running the >> process on the server and it worked fine. So I have to believe this is >> related to nobarrier. Tomorrow I will try removing that setting, but I >> am no closer to solving this and I have to leave Japan Saturday :-( >> >> The bad disk still has not been replaced - that is supposed to happen >> tomorrow, but I won't have enough time after that to draw any >> conclusions. > > I've seen behavior like that with disks that are on their way out... > basically the system wants to read a block of data, and the disk > doesn't read it successfully, so it keeps trying. The kind of disk, > what kind of controller it's behind, raid level, and various other > settings can all impact this phenomenon, and also how much detail you > can see about it. You already know you have one bad disk, so that's > kind of an open wound that may or may not be contributing to your > bigger, unsolved problem. Just replaced the disk but I am leaving tomorrow so it was decided that we will run the process on the C7 server, at least for now. I will probably have to come back here early next year and revisit this. We are thinking of building a new system back in NY and shipping it here and swapping them out. > > So that makes me think, you can also do some basic disk benchmarking. > iozone and bonnie++ are nice, but I'm guessing they're not installed > and you don't have a means to install them. But you can use "dd" to > do some basic benchmarking, and that's all but guaranteed to be > installed. Similar to network benchmarking, you can do something > like: > time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256 > > That will generate a 256 GB file. Adjust "bs" and "count" to whatever > makes sense. General rule of thumb is you want the target file to be > at least 2x the amount of RAM in the system to avoid cache effects > from skewing your results. Bigger is even better if you have the > space, as it increases the odds of hitting the "bad" part of the disk > (if indeed that's the source of your problem). > > Do that on C6, C7, and if you can a similar machine as a "control" > box, it would be ideal. Again,
Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 12:35 PM, Gordon Messmer wrote: > On 10/26/2016 09:54 PM, Larry Martell wrote: >> >> And on the C6 client there is a similar blocked message for the ftp >> job, blocked on nfs_flush, then the bad sequence number message I had >> seen before, and at that point the ftp_job hung. > > > > Are any of these systems using jumbo frames? Check the MTU in the output of > "ip link show" on every system, server and client. If any device doesn't > match the MTU of all of the others, that might cause the problem you're > describing. And if they all match, but they're larger than 1500, a switch > that doesn't support jumbo frames would also cause the problem you're > describing. They all are 1500. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] odd sendmail question
I've looked thru all the stuff at sendmail.org (or whatever its name is now, they seem to have gone corporate...) and don't see anything relating to this: I get a series of log entries in /var/log/maillog, once or twice a day at various times--not 12 or 24 hours apart. not always the identical series either. here's the latest one: Oct 27 21:28:55 fcshome sendmail[7939]: starting daemon (8.14.7): SMTP+queueing@01:00:00 Oct 27 21:28:56 fcshome sendmail[7980]: starting daemon (8.14.7): SMTP+queueing@01:00:00 Oct 27 21:28:56 fcshome sm-msp-queue[7992]: starting daemon (8.14.7): queueing@01:00:00 Oct 27 21:29:19 fcshome sendmail[8054]: starting daemon (8.14.7): SMTP+queueing@01:00:00 Oct 27 21:29:19 fcshome sm-msp-queue[8093]: starting daemon (8.14.7): queueing@01:00:00 Oct 27 21:29:19 fcshome sendmail[8095]: starting daemon (8.14.7): SMTP+queueing@01:00:00 Anyone have a clue what this is all about? and why so many? -- Fred Smith -- fre...@fcshome.stoneham.ma.us - Show me your ways, O LORD, teach me your paths; Guide me in your truth and teach me, for you are God my Savior, And my hope is in you all day long. -- Psalm 25:4-5 (NIV) ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
Matt Garman wrote: > On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell > wrote: > On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell > wrote: >> Well I spoke too soon. The importer (the one that was initially >> hanging that I came here to fix) hung up after running 20 hours. There >> were no NFS errors or messages on neither the client nor the server. >> When I restarted it, it hung after 1 minute, Restarted it again and it >> hung after 20 seconds. After that when I restarted it it hung >> immediately. Still no NFS errors or messages. I tried running the >> process on the server and it worked fine. So I have to believe this is >> related to nobarrier. Tomorrow I will try removing that setting, but I >> am no closer to solving this and I have to leave Japan Saturday :-( >> >> The bad disk still has not been replaced - that is supposed to happen >> tomorrow, but I won't have enough time after that to draw any >> conclusions. > > I've seen behavior like that with disks that are on their way out... I just had a truly unpleasant thought, speaking of disks. Years ago, we tried some WD Green drives in our servers, and that was a disaster. In somewhere between days and weeks, the drives would go offline. I finally found out what happened: consumer-grade drives are intended for desktops, and the TLER - how long the drive keeps trying to read or write to a sector before giving up, marking the sector bad, and going somewhere else - is two *minutes*. Our servers were expecting the TLER to be 7 *seconds* or under. Any chance the client cheaped out with any of the drives? mark ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] Fwd: CentOS on new Dell
On Mon, Oct 24, 2016 at 8:11 PM, Milos Blazevic wrote: > I've seen the thread(s) you started on CentOS mailing list about Dell and > ThinkPad > laptops and running Centos on 'em. > > Not sure if you've seen my question, but I'm considering to purchase a > laptop, run EL7 on it, and I'm weighing between the Thinkpad and Latitude, so: > > What was it to make you opt for E7470 over, say, Carbon X1? According to > RedHat's Hardware compatility list Carbon models are certified, > while none of the Dell's aren't. > > Also, have you given up on CentOS over Fedora? I'd love to hear how's CentOS > 7 support for E7470 hardware. Hi Milos, The Thinkpad T series and Latitude are *very* similar computers. They are both business "ultrabooks" with a 1600x1080 display option, nice keyboards (not "chicklet" style), a trackpoint and trackpad and RJ-45 builtin. I bought a Dell Latitude E7470 over the Lenovo for several reasons. One is this comment which is worth mentioning again: On Fri, Sep 30, 2016 at 11:58 PM, Gordon Messmer wrote: > It's worth mentioning again that Dell is one of the companies doing the > development for the bits that don't work, and that those drivers are often > the ones that get Lenovo equipment going, too. Lenovo does not, to the best > of my knowledge, do any Linux development. Another reason is that I have heard about people having problems with Lenovo. Not just with software but with hardware malfunctions. I spoke to someone on the phone that had hardware problems with their new Thinkpad (although I suspect some of the problems could have been misdiagnosis by the user). After describing how nice the E7470 they're thinking about dumping their 1yo X250 and getting a Dell. As for the Carbon, that is a very different computer. The Carbon is an ultralight / thin Macbook-like machine with Windows so I have no advice for you there. I have not tried CentOS on the E7470 but I'm quite certain it would not work because I have tried the latest Fedora Live which is about 100 kernel revisions newer and even that doesn't completely work. Specifically, if I plug in an external display it freezes. My feeling is I need a newer display driver (and thus newer kernel). The only other issue I noticed was that wireless didn't work but it seems more like a glue issue and not necessarily a driver. Otherwise, suspend and everything else worked near as I can tell which is actually pretty impressive for a brand new machine. So, I am doing other things while this new E7470 ages like a fine wine. Or maybe I'll loose patience and just install Fedora and try a "vanilla" kernel package. Then maybe after a year or two CentOS 8 or whatever will run on it and then I can just run steady for 4+ years without getting pummeled by stupid updates and feature creep that you get with Fedora and Ubuntu or whatever the latest hot distro is. The E7470 is obviously a laptop of choice for business people. And that is the type of machine developers use. So chances of good compatibility are very high. You just have to give it time. I was watching Daredevil season 1 and they use Latitudes that look exactly like mine. And that was probably filmed in 2014. So the form factor at least has been around for a while which is good. Unfortunately I can't say the same thing about the show. Mike ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On 27/10/16 21:23, Matt Garman wrote: > > If you have the ability to take these systems offline temporarily, you > can also run "fsck" (file system check) on the C6 and C7 file systems. > IIRC, ext4 can do a very basic kind of check on a mounted filesystem. > But a deeper/more comprehensive scan requires the FS to be unmounted. > Not sure what the rules are for xfs. But C6 uses ext4 by default so > you could probably at least run the basic check on that without taking > the system offline. Don't bother with fsck on XFS filesystems. From the man page [fsck.xfs(8)]: "XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits with a zero exit status". If you need a deeper examination use xfs_repair(8) and note that: "the filesystem to be repaired must be unmounted, otherwise, the resulting filesystem may be inconsistent or corrupt" (from the man page). signature.asc Description: OpenPGP digital signature ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell wrote: > This site is locked down like no other I have ever seen. You cannot > bring anything into the site - no computers, no media, no phone. You > ... > This is my client's client, and even if I could circumvent their > policy I would not do that. They have a zero tolerance policy and if > ... OK, no internet for real. :) Sorry I kept pushing this. I made an unflattering assumption that maybe it just hadn't occurred to you how to get files in or out. Sometimes there are "soft" barriers to bringing files in or out: they don't want it to be trivial, but want it to be doable if necessary. But then there are times when they really mean it. I thought maybe the former applied to you, but clearly it's the latter. Apologies. > These are all good debugging techniques, and I have tried some of > them, but I think the issue is load related. There are 50 external > machines ftp-ing to the C7 server, 24/7, thousands of files a day. And > on the C6 client the script that processes them is running > continuously. It will sometimes run for 7 hours then hang, but it has > run for as long as 3 days before hanging. I have never been able to > reproduce the errors/hanging situation manually. If it truly is load related, I'd think you'd see something askew in the sar logs. But if the load tends to spike, rather than be continuous, the sar sampling rate may be too coarse to pick it up. > And again, this is only at this site. We have the same software > deployed at 10 different sites all doing the same thing, and it all > works fine at all of those. Flaky hardware can also cause weird intermittent issues. I know you mentioned before your hardware is fairly new/decent spec; but that doesn't make it immune to manufacturing defects. For example, imagine one voltage regulator that's ever-so-slightly out of spec. It happens. Bad memory is not uncommon and certainly causes all kinds of mysterious issues (though in my experience that tends to result in spontaneous reboots or hard lockups, but truly anything could happen). Ideally, you could take the system offline and run hardware diagnostics, but I suspect that's impossible given your restrictions on taking things in/out of the datacenter. On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell wrote: > Well I spoke too soon. The importer (the one that was initially > hanging that I came here to fix) hung up after running 20 hours. There > were no NFS errors or messages on neither the client nor the server. > When I restarted it, it hung after 1 minute, Restarted it again and it > hung after 20 seconds. After that when I restarted it it hung > immediately. Still no NFS errors or messages. I tried running the > process on the server and it worked fine. So I have to believe this is > related to nobarrier. Tomorrow I will try removing that setting, but I > am no closer to solving this and I have to leave Japan Saturday :-( > > The bad disk still has not been replaced - that is supposed to happen > tomorrow, but I won't have enough time after that to draw any > conclusions. I've seen behavior like that with disks that are on their way out... basically the system wants to read a block of data, and the disk doesn't read it successfully, so it keeps trying. The kind of disk, what kind of controller it's behind, raid level, and various other settings can all impact this phenomenon, and also how much detail you can see about it. You already know you have one bad disk, so that's kind of an open wound that may or may not be contributing to your bigger, unsolved problem. So that makes me think, you can also do some basic disk benchmarking. iozone and bonnie++ are nice, but I'm guessing they're not installed and you don't have a means to install them. But you can use "dd" to do some basic benchmarking, and that's all but guaranteed to be installed. Similar to network benchmarking, you can do something like: time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256 That will generate a 256 GB file. Adjust "bs" and "count" to whatever makes sense. General rule of thumb is you want the target file to be at least 2x the amount of RAM in the system to avoid cache effects from skewing your results. Bigger is even better if you have the space, as it increases the odds of hitting the "bad" part of the disk (if indeed that's the source of your problem). Do that on C6, C7, and if you can a similar machine as a "control" box, it would be ideal. Again, we're looking for outliers, hang-ups, timeouts, etc. +1 to Gordon's suggestion to sanity check MTU sizes. Another random possibility... By somewhat funny coincidence, we have some servers in Japan as well, and were recently banging our heads against the wall with some weird networking issues. The remote hands we had helping us (none of our staff was on site) claimed one or more fiber cables were dusty, enough that it was affecting light levels. They cleaned the cables and the proble
Re: [CentOS] [OT] How to recover data from an IDE drive
On 10/27/2016 11:20 AM, Fred Smith wrote: I got one of those from, er, either amazon or newegg a few years ago, and while it works for a PATA drive, no matter what I did it wouldn't work with an optical drive. despite the customer support people insisting it does work. following their configuration settings didn't help. likely because optical IDE drives use a completely different command set known as ATAPI, which is scsi based. the adapter I linked is strictly for hard drives -- john r pierce, recycling bits in santa cruz ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] [OT] How to recover data from an IDE drive
On Thu, Oct 27, 2016 at 09:19:23AM -0500, Leroy Tennison wrote: > While IDE-to-USB is probably the easier option to use, I got an IDE-to-Sata > adapter on eBay for almost nothing (of course, you have to wait for it to > arrive directly from China). If you go this route, the thing I learned from > the experience was to set the IDE drive to master (there won't be a slave > unless you get a one-to-two converter - I didn't see one of the latter). > Also, unless the converter goes both ways, pay attention to which is the > controller side and which is the drive side. I got one of those from, er, either amazon or newegg a few years ago, and while it works for a PATA drive, no matter what I did it wouldn't work with an optical drive. despite the customer support people insisting it does work. following their configuration settings didn't help. So, YMMV. > > - Original Message - > From: "Digimer" > To: "CentOS mailing list" > Sent: Wednesday, October 26, 2016 8:10:05 PM > Subject: Re: [CentOS] [OT] How to recover data from an IDE drive > > On 26/10/16 09:01 PM, TE Dukes wrote: > > Hello, > > > > As some may recall, I suffered a hardware failure of a 10 yr old IBM > > Netvista back in January. I was backing up my personal data, 'My Documents', > > to my CentOS server but I apparently didn't get my emails. > > > > It was a main board failure and I believe the data is still good on the hard > > drive. Only problem, its an IDE drive and my server and new PC have SATA > > drives. > > > > Is it possible to install the old drive as a secondary drive into a newer PC > > with SATA drives? If so, how do I do this? I need to access the emails. > > > > This was a Windows XP machine using Outlook as the mail client. > > > > TIA!! > > There are plenty of IDE to USB adapters out there, so one of those is > probably best. Here's what amazon has when searching for 'ide to usb': > > https://www.amazon.ca/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=ide+to+usb > > Most should work fine in Linux, but if you narrow down a specific > make/model, a quick google search should confirm linux support. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos -- --- Under no circumstances will I ever purchase anything offered to me as the result of an unsolicited e-mail message. Nor will I forward chain letters, petitions, mass mailings, or virus warnings to large numbers of others. This is my contribution to the survival of the online community. --Roger Ebert, December, 1996 - The Boulder Pledge - ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] Re: Disk near failure
On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote: Il 24/10/2016 14:05, Leonard den Ottolander ha scritto: On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote: > === START OF READ SMART DATA SECTION === > SMART Error Log not supported I reckon there's a between those lines. The line right after the first should read something like: SMART overall-health self-assessment test result: PASSED or "FAILED" for that matter. If not try running smartctl -t short /dev/sda , wait for the indicated time to expire, then check the output of smartctl -a (or -x) again. Regards, Leonard. Hi Leonard, after a smart short test, the output of smartctl -a /dev/... is === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: Corsair Force GT Serial Number:1229794815020A81 LU WWN Device Id: 0 00 0 Firmware Version: 5.02 User Capacity:120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Thu Oct 27 11:22:22 2016 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:(0) seconds. Offline data collection capabilities:(0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 1) minutes. Extended self-test routine recommended polling time:( 48) minutes. Conveyance self-test routine recommended polling time:( 2) minutes. SCT capabilities: (0x0021) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 120 120 050Pre-fail Always - 0/0 5 Retired_Block_Count 0x0033 100 100 003Pre-fail Always - 0 9 Power_On_Hours_and_Msec 0x0032 000 000 000Old_age Always - 17394h+07m+56.840s 12 Power_Cycle_Count 0x0032 099 099 000Old_age Always - 1974 171 Program_Fail_Count 0x0032 000 000 000Old_age Always - 0 172 Erase_Fail_Count 0x0032 000 000 000Old_age Always - 0 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000Old_age Offline - 780 177 Wear_Range_Delta 0x 000 000 000Old_age Offline - 3 181 Program_Fail_Count 0x0032 000 000 000Old_age Always - 0 182 Erase_Fail_Count 0x0032 000 000 000Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 194 Temperature_Celsius0x0022 029 042 000Old_age Always - 29 (Min/Max 15/42) 195 ECC_Uncorr_Error_Count 0x001c 100 100 000Old_age Offline - 0/0 196 Reallocated_Event_Ct 0x0033 100 100 003Pre-fail Always - 0 201 Unc_Soft_Read_Err_Rate 0x001c 100 100 000Old_age Offline - 0/0 204 Soft_ECC_Correct_Rate 0x001c 100 100 000Old_age Offline - 0/0 230 Life_Curve_Status 0x0013 100 100 000Pre-fail Always - 100 231 SSD_Life_Left 0x0013 100 100 010Pre-fail Always - 0 233 SandForce_Internal 0x 000 000 000Old_age Offline - 6599 234 SandForce_Internal 0x00
Re: [CentOS] NFS help
On 10/26/2016 09:54 PM, Larry Martell wrote: And on the C6 client there is a similar blocked message for the ftp job, blocked on nfs_flush, then the bad sequence number message I had seen before, and at that point the ftp_job hung. Are any of these systems using jumbo frames? Check the MTU in the output of "ip link show" on every system, server and client. If any device doesn't match the MTU of all of the others, that might cause the problem you're describing. And if they all match, but they're larger than 1500, a switch that doesn't support jumbo frames would also cause the problem you're describing. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Disk near failure
On 10/27/2016 09:43 AM, Alessandro Baggi wrote: Il 27/10/2016 13:58, Leonard den Ottolander ha scritto: Hi, On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote: === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED That's the line you are looking for. Since your disk apparently does not store an error log - not sure if that's something with SSDs in general or just with this particular disk - you will always have to invoke smartctl -t short /dev/sda and then after the test has completed check the output of smartctl -a /dev/sda for that particular line. Shouldn't be too hard to put in a cron job, just make sure the job waits long enough (more than 1 minute, make it 2 to be sure) with reading the output of smartctl -a after invoking smartctl -t short. Regards, Leonard. You can also use the service smartd and edit the smartd.conf file and it have it send you emails when a disk starts to fail. thank you for suggestion. Alessandro. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos -- Stephen Clark *NetWolves Managed Services, LLC.* Director of Technology Phone: 813-579-3200 Fax: 813-882-0209 Email: steve.cl...@netwolves.com http://www.netwolves.com ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] [OT] How to recover data from an IDE drive
While IDE-to-USB is probably the easier option to use, I got an IDE-to-Sata adapter on eBay for almost nothing (of course, you have to wait for it to arrive directly from China). If you go this route, the thing I learned from the experience was to set the IDE drive to master (there won't be a slave unless you get a one-to-two converter - I didn't see one of the latter). Also, unless the converter goes both ways, pay attention to which is the controller side and which is the drive side. - Original Message - From: "Digimer" To: "CentOS mailing list" Sent: Wednesday, October 26, 2016 8:10:05 PM Subject: Re: [CentOS] [OT] How to recover data from an IDE drive On 26/10/16 09:01 PM, TE Dukes wrote: > Hello, > > As some may recall, I suffered a hardware failure of a 10 yr old IBM > Netvista back in January. I was backing up my personal data, 'My Documents', > to my CentOS server but I apparently didn't get my emails. > > It was a main board failure and I believe the data is still good on the hard > drive. Only problem, its an IDE drive and my server and new PC have SATA > drives. > > Is it possible to install the old drive as a secondary drive into a newer PC > with SATA drives? If so, how do I do this? I need to access the emails. > > This was a Windows XP machine using Outlook as the mail client. > > TIA!! There are plenty of IDE to USB adapters out there, so one of those is probably best. Here's what amazon has when searching for 'ide to usb': https://www.amazon.ca/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=ide+to+usb Most should work fine in Linux, but if you narrow down a specific make/model, a quick google search should confirm linux support. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Disk near failure
Il 27/10/2016 13:58, Leonard den Ottolander ha scritto: Hi, On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote: === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED That's the line you are looking for. Since your disk apparently does not store an error log - not sure if that's something with SSDs in general or just with this particular disk - you will always have to invoke smartctl -t short /dev/sda and then after the test has completed check the output of smartctl -a /dev/sda for that particular line. Shouldn't be too hard to put in a cron job, just make sure the job waits long enough (more than 1 minute, make it 2 to be sure) with reading the output of smartctl -a after invoking smartctl -t short. Regards, Leonard. thank you for suggestion. Alessandro. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] python script from crontab - problems with proper execution
Hi Rafal, You'll want to change the command to /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user --bb_pass bb_pass --bd_log_dir /path/logs >> /path/script_repo_scanner.py.log Notice that &> is changed to >> Take care, Brian Bernard On Thu, Oct 27, 2016 at 5:47 AM, Rafał Radecki wrote: > Hi All. > > I currently have a problem with proper invocation of a python script with > cron. > > non-root $ crontab -l > #Ansible: script_repo_scanner > 55 11 * * * /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user > --bb_pass bb_pass --bd_log_dir /path/logs &> > /path/script_repo_scanner.py.log > > And in /var/log/cron I see that cron executed the script but there is no > log output in /path/script_repo_scanner.py.log and the script did not > perform his job. So it looks like it has not been run despite entries in > /var/log/cron ;) > > When I execute the command > > non-root$ /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user > --bb_pass bb_pass --bd_log_dir /path/logs &> > /path/script_repo_scanner.py.log > > I get standard output (script logs to stdout) and script does its job. > > Any clue what I could be missing? > > BR, > Rafal. > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Disk near failure
Hi, On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote: > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED That's the line you are looking for. Since your disk apparently does not store an error log - not sure if that's something with SSDs in general or just with this particular disk - you will always have to invoke smartctl -t short /dev/sda and then after the test has completed check the output of smartctl -a /dev/sda for that particular line. Shouldn't be too hard to put in a cron job, just make sure the job waits long enough (more than 1 minute, make it 2 to be sure) with reading the output of smartctl -a after invoking smartctl -t short. Regards, Leonard. -- mount -t life -o ro /dev/dna /genetic/research ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] python script from crontab - problems with proper execution
Hi All. I currently have a problem with proper invocation of a python script with cron. non-root $ crontab -l #Ansible: script_repo_scanner 55 11 * * * /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user --bb_pass bb_pass --bd_log_dir /path/logs &> /path/script_repo_scanner.py.log And in /var/log/cron I see that cron executed the script but there is no log output in /path/script_repo_scanner.py.log and the script did not perform his job. So it looks like it has not been run despite entries in /var/log/cron ;) When I execute the command non-root$ /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user --bb_pass bb_pass --bd_log_dir /path/logs &> /path/script_repo_scanner.py.log I get standard output (script logs to stdout) and script does its job. Any clue what I could be missing? BR, Rafal. ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Disk near failure
Il 24/10/2016 14:05, Leonard den Ottolander ha scritto: Hi, On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote: === START OF READ SMART DATA SECTION === SMART Error Log not supported I reckon there's a between those lines. The line right after the first should read something like: SMART overall-health self-assessment test result: PASSED or "FAILED" for that matter. If not try running smartctl -t short /dev/sda , wait for the indicated time to expire, then check the output of smartctl -a (or -x) again. Regards, Leonard. Hi Leonard, after a smart short test, the output of smartctl -a /dev/... is === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: Corsair Force GT Serial Number:1229794815020A81 LU WWN Device Id: 0 00 0 Firmware Version: 5.02 User Capacity:120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Thu Oct 27 11:22:22 2016 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:(0) seconds. Offline data collection capabilities:(0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 1) minutes. Extended self-test routine recommended polling time:( 48) minutes. Conveyance self-test routine recommended polling time:( 2) minutes. SCT capabilities: (0x0021) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 120 120 050Pre-fail Always - 0/0 5 Retired_Block_Count 0x0033 100 100 003Pre-fail Always - 0 9 Power_On_Hours_and_Msec 0x0032 000 000 000Old_age Always - 17394h+07m+56.840s 12 Power_Cycle_Count 0x0032 099 099 000Old_age Always - 1974 171 Program_Fail_Count 0x0032 000 000 000Old_age Always - 0 172 Erase_Fail_Count0x0032 000 000 000Old_age Always - 0 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000Old_age Offline - 780 177 Wear_Range_Delta0x 000 000 000Old_age Offline - 3 181 Program_Fail_Count 0x0032 000 000 000Old_age Always - 0 182 Erase_Fail_Count0x0032 000 000 000Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 194 Temperature_Celsius 0x0022 029 042 000Old_age Always - 29 (Min/Max 15/42) 195 ECC_Uncorr_Error_Count 0x001c 100 100 000Old_age Offline - 0/0 196 Reallocated_Event_Count 0x0033 100 100 003Pre-fail Always - 0 201 Unc_Soft_Read_Err_Rate 0x001c 100 100 000Old_age Offline - 0/0 204 Soft_ECC_Correct_Rate 0x001c 100 100 000Old_age Offline - 0/0 230 Life_Curve_Status 0x0013 100 100 000Pre-fail Always - 100 231 SSD_Life_Left 0x0013 100
Re: [CentOS] NFS help
On Thu, Oct 27, 2016 at 1:03 AM, Larry Martell wrote: > On Wed, Oct 26, 2016 at 9:35 AM, Matt Garman wrote: >> On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell >> wrote: >>> Again, no machine on the internal network that my 2 CentOS hosts are >>> on are connected to the internet. I have no way to download anything., >>> There is an onerous and protracted process to get files into the >>> internal network and I will see if I can get netperf in. >> >> Right, but do you have physical access to those machines? Do you have >> physical access to the machine which on which you use PuTTY to connect >> to those machines? If yes to either question, then you can use >> another system (that does have Internet access) to download the files >> you want, put them on a USB drive (or burn to a CD, etc), and bring >> the USB/CD to the C6/C7/PuTTY machines. > > This site is locked down like no other I have ever seen. You cannot > bring anything into the site - no computers, no media, no phone. You > have to empty your pockets and go through an airport type naked body > scan. > >> There's almost always a technical way to get files on to (or out of) a >> system. :) Now, your company might have *policies* that forbid >> skirting around the technical measures that are in place. > > This is my client's client, and even if I could circumvent their > policy I would not do that. They have a zero tolerance policy and if > you are caught violating it you are banned for life from the company. > And that would not make my client happy. > >> Here's another way you might be able to test network connectivity >> between C6 and C7 without installing new tools: see if both machines >> have "nc" (netcat) installed. I've seen this tool referred to as "the >> swiss army knife of network testing tools", and that is indeed an apt >> description. So if you have that installed, you can hit up the web >> for various examples of its use. It's designed to be easily scripted, >> so you can write your own tests, and in theory implement something >> similar to netperf. >> >> OK, I just thought of another "poor man's" way to at least do some >> sanity testing between C6 and C7: scp. First generate a huge file. >> General rule of thumb is at least 2x the amount of RAM in the C7 host. >> You could create a tarball of /usr, for example (e.g. "tar czvf >> /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough >> to hold this). Then, first do this: "time scp /tmp/bigfile.tar.gz >> localhost:/tmp/bigfile_copy.tar.gz". This will literally make a copy >> of that big file, but will route through most of of the network stack. >> Make a note of how long it took. And also be sure your /tmp partition >> is big enough for two copies of that big file. >> >> Now, repeat that, but instead of copying to localhost, copy to the C6 >> box. Something like: "time scp /tmp/bigfile.tar.gz > host>:/tmp/". Does the time reported differ greatly from when you >> copied to localhost? I would expect them to be reasonably close. >> (And this is another reason why you want a fairly large file, so the >> transfer time is dominated by actual file transfer, rather than the >> overhead.) >> >> Lastly, do the reverse test: log in to the C6 box, and copy the file >> back to C7, e.g. "time scp /tmp/bigfile.tar.gz > host>:/tmp/bigfile_copy2.tar.gz". Again, the time should be >> approximately the same for all three transfers. If either or both of >> the latter two copies take dramatically longer than the first, then >> there's a good chance something is askew with the network config >> between C6 and C7. >> >> Oh... all this time I've been jumping to fancy tests. Have you tried >> the simplest form of testing, that is, doing by hand what your scripts >> do automatically? In other words, simply try copying files between C6 >> and C7 using the existing NFS config? Can you manually trigger the >> errors/timeouts you initially posted? Is it when copying lots of >> small files? Or when you copy a single huge file? Any kind of file >> copying "profile" you can determine that consistently triggers the >> error? That could be another clue. > > These are all good debugging techniques, and I have tried some of > them, but I think the issue is load related. There are 50 external > machines ftp-ing to the C7 server, 24/7, thousands of files a day. And > on the C6 client the script that processes them is running > continuously. It will sometimes run for 7 hours then hang, but it has > run for as long as 3 days before hanging. I have never been able to > reproduce the errors/hanging situation manually. > > And again, this is only at this site. We have the same software > deployed at 10 different sites all doing the same thing, and it all > works fine at all of those. Well I spoke too soon. The importer (the one that was initially hanging that I came here to fix) hung up after running 20 hours. There were no NFS errors or messages on neither the client nor the server. When