Re: [CentOS] NFS help

2016-10-27 Thread Larry Martell
On Thu, Oct 27, 2016 at 5:16 PM,   wrote:
> Matt Garman wrote:
>> On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell 
>> wrote:
> 
>> On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell 
>> wrote:
>>> Well I spoke too soon. The importer (the one that was initially
>>> hanging that I came here to fix) hung up after running 20 hours. There
>>> were no NFS errors or messages on neither the client nor the server.
>>> When I restarted it, it hung after 1 minute, Restarted it again and it
>>> hung after 20 seconds. After that when I restarted it it hung
>>> immediately. Still no NFS errors or messages. I tried running the
>>> process on the server and it worked fine. So I have to believe this is
>>> related to nobarrier. Tomorrow I will try removing that setting, but I
>>> am no closer to solving this and I have to leave Japan Saturday :-(
>>>
>>> The bad disk still has not been replaced - that is supposed to happen
>>> tomorrow, but I won't have enough time after that to draw any
>>> conclusions.
>>
>> I've seen behavior like that with disks that are on their way out...
> 
> I just had a truly unpleasant thought, speaking of disks. Years ago, we
> tried some WD Green drives in our servers, and that was a disaster. In
> somewhere between days and weeks, the drives would go offline. I finally
> found out what happened: consumer-grade drives are intended for desktops,
> and the TLER - how long the drive keeps trying to read or write to a
> sector before giving up, marking the sector bad, and going somewhere else
> - is two *minutes*. Our servers were expecting the TLER to be 7 *seconds*
> or under. Any chance the client cheaped out with any of the drives?

No, it's a fairly high end Lenovo X series server (X3650 I think).
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-27 Thread Larry Martell
On Thu, Oct 27, 2016 at 4:23 PM, Matt Garman  wrote:
> On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell  
> wrote:
>> This site is locked down like no other I have ever seen. You cannot
>> bring anything into the site - no computers, no media, no phone. You
>> ...
>> This is my client's client, and even if I could circumvent their
>> policy I would not do that. They have a zero tolerance policy and if
>> ...
>
> OK, no internet for real. :) Sorry I kept pushing this.  I made an
> unflattering assumption that maybe it just hadn't occurred to you how
> to get files in or out.  Sometimes there are "soft" barriers to
> bringing files in or out: they don't want it to be trivial, but want
> it to be doable if necessary.  But then there are times when they
> really mean it.  I thought maybe the former applied to you, but
> clearly it's the latter.  Apologies.
>
>> These are all good debugging techniques, and I have tried some of
>> them, but I think the issue is load related. There are 50 external
>> machines ftp-ing to the C7 server, 24/7, thousands of files a day. And
>> on the C6 client the script that processes them is running
>> continuously. It will sometimes run for 7 hours then hang, but it has
>> run for as long as 3 days before hanging. I have never been able to
>> reproduce the errors/hanging situation manually.
>
> If it truly is load related, I'd think you'd see something askew in
> the sar logs.  But if the load tends to spike, rather than be
> continuous, the sar sampling rate may be too coarse to pick it up.
>
>> And again, this is only at this site. We have the same software
>> deployed at 10 different sites all doing the same thing, and it all
>> works fine at all of those.
>
> Flaky hardware can also cause weird intermittent issues.  I know you
> mentioned before your hardware is fairly new/decent spec; but that
> doesn't make it immune to manufacturing defects.  For example, imagine
> one voltage regulator that's ever-so-slightly out of spec.  It
> happens.  Bad memory is not uncommon and certainly causes all kinds of
> mysterious issues (though in my experience that tends to result in
> spontaneous reboots or hard lockups, but truly anything could happen).
>
> Ideally, you could take the system offline and run hardware
> diagnostics, but I suspect that's impossible given your restrictions
> on taking things in/out of the datacenter.
>
> On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell  
> wrote:
>> Well I spoke too soon. The importer (the one that was initially
>> hanging that I came here to fix) hung up after running 20 hours. There
>> were no NFS errors or messages on neither the client nor the server.
>> When I restarted it, it hung after 1 minute, Restarted it again and it
>> hung after 20 seconds. After that when I restarted it it hung
>> immediately. Still no NFS errors or messages. I tried running the
>> process on the server and it worked fine. So I have to believe this is
>> related to nobarrier. Tomorrow I will try removing that setting, but I
>> am no closer to solving this and I have to leave Japan Saturday :-(
>>
>> The bad disk still has not been replaced - that is supposed to happen
>> tomorrow, but I won't have enough time after that to draw any
>> conclusions.
>
> I've seen behavior like that with disks that are on their way out...
> basically the system wants to read a block of data, and the disk
> doesn't read it successfully, so it keeps trying.  The kind of disk,
> what kind of controller it's behind, raid level, and various other
> settings can all impact this phenomenon, and also how much detail you
> can see about it.  You already know you have one bad disk, so that's
> kind of an open wound that may or may not be contributing to your
> bigger, unsolved problem.

Just replaced the disk but I am leaving tomorrow so it was decided
that we will run the process on the C7 server, at least for now. I
will probably have to come back here early next year and revisit this.
We are thinking of building a new system back in NY and shipping it
here and swapping them out.
>
> So that makes me think, you can also do some basic disk benchmarking.
> iozone and bonnie++ are nice, but I'm guessing they're not installed
> and you don't have a means to install them.  But you can use "dd" to
> do some basic benchmarking, and that's all but guaranteed to be
> installed.  Similar to network benchmarking, you can do something
> like:
> time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256
>
> That will generate a 256 GB file.  Adjust "bs" and "count" to whatever
> makes sense.  General rule of thumb is you want the target file to be
> at least 2x the amount of RAM in the system to avoid cache effects
> from skewing your results.  Bigger is even better if you have the
> space, as it increases the odds of hitting the "bad" part of the disk
> (if indeed that's the source of your problem).
>
> Do that on C6, C7, and if you can a similar machine as a "control"
> box, it would be ideal.  Again,

Re: [CentOS] NFS help

2016-10-27 Thread Larry Martell
On Thu, Oct 27, 2016 at 12:35 PM, Gordon Messmer
 wrote:
> On 10/26/2016 09:54 PM, Larry Martell wrote:
>>
>> And on the C6 client there is a similar blocked message for the ftp
>> job, blocked on nfs_flush, then the bad sequence number message I had
>> seen before, and at that point the ftp_job hung.
>
>
>
> Are any of these systems using jumbo frames?  Check the MTU in the output of
> "ip link show" on every system, server and client. If any device doesn't
> match the MTU of all of the others, that might cause the problem you're
> describing.  And if they all match, but they're larger than 1500, a switch
> that doesn't support jumbo frames would also cause the problem you're
> describing.

They all are 1500.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] odd sendmail question

2016-10-27 Thread Fred Smith
I've looked thru all the stuff at sendmail.org (or whatever its name
is now, they seem to have gone corporate...) and don't see anything
relating to this:

I get a series of log entries in /var/log/maillog, once or twice a day
at various times--not 12 or 24 hours apart. not always the identical
series either. here's the latest one:

Oct 27 21:28:55 fcshome sendmail[7939]: starting daemon (8.14.7): 
SMTP+queueing@01:00:00
Oct 27 21:28:56 fcshome sendmail[7980]: starting daemon (8.14.7): 
SMTP+queueing@01:00:00
Oct 27 21:28:56 fcshome sm-msp-queue[7992]: starting daemon (8.14.7): 
queueing@01:00:00
Oct 27 21:29:19 fcshome sendmail[8054]: starting daemon (8.14.7): 
SMTP+queueing@01:00:00
Oct 27 21:29:19 fcshome sm-msp-queue[8093]: starting daemon (8.14.7): 
queueing@01:00:00
Oct 27 21:29:19 fcshome sendmail[8095]: starting daemon (8.14.7): 
SMTP+queueing@01:00:00

Anyone have a clue what this is all about? and why so many?

-- 
 Fred Smith -- fre...@fcshome.stoneham.ma.us -
   Show me your ways, O LORD, teach me your paths;
 Guide me in your truth and teach me,
 for you are God my Savior,
And my hope is in you all day long.
-- Psalm 25:4-5 (NIV) 
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-27 Thread m . roth
Matt Garman wrote:
> On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell 
> wrote:

> On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell 
> wrote:
>> Well I spoke too soon. The importer (the one that was initially
>> hanging that I came here to fix) hung up after running 20 hours. There
>> were no NFS errors or messages on neither the client nor the server.
>> When I restarted it, it hung after 1 minute, Restarted it again and it
>> hung after 20 seconds. After that when I restarted it it hung
>> immediately. Still no NFS errors or messages. I tried running the
>> process on the server and it worked fine. So I have to believe this is
>> related to nobarrier. Tomorrow I will try removing that setting, but I
>> am no closer to solving this and I have to leave Japan Saturday :-(
>>
>> The bad disk still has not been replaced - that is supposed to happen
>> tomorrow, but I won't have enough time after that to draw any
>> conclusions.
>
> I've seen behavior like that with disks that are on their way out...

I just had a truly unpleasant thought, speaking of disks. Years ago, we
tried some WD Green drives in our servers, and that was a disaster. In
somewhere between days and weeks, the drives would go offline. I finally
found out what happened: consumer-grade drives are intended for desktops,
and the TLER - how long the drive keeps trying to read or write to a
sector before giving up, marking the sector bad, and going somewhere else
- is two *minutes*. Our servers were expecting the TLER to be 7 *seconds*
or under. Any chance the client cheaped out with any of the drives?

  mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Fwd: CentOS on new Dell

2016-10-27 Thread Michael B Allen
On Mon, Oct 24, 2016 at 8:11 PM, Milos Blazevic  wrote:
> I've seen the thread(s) you started on CentOS mailing list about Dell and 
> ThinkPad
> laptops and running Centos on 'em.
>
> Not sure if you've seen my question, but I'm considering to purchase a 
> laptop, run EL7 on it, and I'm weighing between the Thinkpad and Latitude, so:
>
> What was it to make you opt for E7470 over, say, Carbon X1? According to 
> RedHat's Hardware compatility list Carbon models are certified,
> while none of the Dell's aren't.
>
> Also, have you given up on CentOS over Fedora? I'd love to hear how's CentOS 
> 7 support for E7470 hardware.

Hi Milos,

The Thinkpad T series and Latitude are *very* similar computers. They
are both business "ultrabooks" with a 1600x1080 display option, nice
keyboards (not "chicklet" style), a trackpoint and trackpad and RJ-45
builtin.

I bought a Dell Latitude E7470 over the Lenovo for several reasons.
One is this comment which is worth mentioning again:

On Fri, Sep 30, 2016 at 11:58 PM, Gordon Messmer
 wrote:
> It's worth mentioning again that Dell is one of the companies doing the
> development for the bits that don't work, and that those drivers are often
> the ones that get Lenovo equipment going, too. Lenovo does not, to the best
> of my knowledge, do any Linux development.

Another reason is that I have heard about people having problems with
Lenovo. Not just with software but with hardware malfunctions. I spoke
to someone on the phone that had hardware problems with their new
Thinkpad (although I suspect some of the problems could have been
misdiagnosis by the user). After describing how nice the E7470 they're
thinking about dumping their 1yo X250 and getting a Dell.

As for the Carbon, that is a very different computer. The Carbon is an
ultralight / thin Macbook-like machine with Windows so I have no
advice for you there.

I have not tried CentOS on the E7470 but I'm quite certain it would
not work because I have tried the latest Fedora Live which is about
100 kernel revisions newer and even that doesn't completely work.
Specifically, if I plug in an external display it freezes. My feeling
is I need a newer display driver (and thus newer kernel). The only
other issue I noticed was that wireless didn't work but it seems more
like a glue issue and not necessarily a driver. Otherwise, suspend and
everything else worked near as I can tell which is actually pretty
impressive for a brand new machine.

So, I am doing other things while this new E7470 ages like a fine
wine. Or maybe I'll loose patience and just install Fedora and try a
"vanilla" kernel package. Then maybe after a year or two CentOS 8 or
whatever will run on it and then I can just run steady for 4+ years
without getting pummeled by stupid updates and feature creep that you
get with Fedora and Ubuntu or whatever the latest hot distro is.

The E7470 is obviously a laptop of choice for business people. And
that is the type of machine developers use. So chances of good
compatibility are very high. You just have to give it time.

I was watching Daredevil season 1 and they use Latitudes that look
exactly like mine. And that was probably filmed in 2014. So the form
factor at least has been around for a while which is good.
Unfortunately I can't say the same thing about the show.

Mike
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-27 Thread J Martin Rushton
On 27/10/16 21:23, Matt Garman wrote:

> 
> If you have the ability to take these systems offline temporarily, you
> can also run "fsck" (file system check) on the C6 and C7 file systems.
> IIRC, ext4 can do a very basic kind of check on a mounted filesystem.
> But a deeper/more comprehensive scan requires the FS to be unmounted.
> Not sure what the rules are for xfs.  But C6 uses ext4 by default so
> you could probably at least run the basic check on that without taking
> the system offline.

Don't bother with fsck on XFS filesystems.  From the man page
[fsck.xfs(8)]: "XFS is a journaling filesystem and performs recovery at
mount(8)  time if necessary, so fsck.xfs simply exits with a zero exit
status".  If you need a deeper examination use xfs_repair(8) and note
that: "the filesystem to be repaired must be unmounted, otherwise, the
resulting filesystem may be inconsistent or corrupt" (from the man page).



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS help

2016-10-27 Thread Matt Garman
On Thu, Oct 27, 2016 at 12:03 AM, Larry Martell  wrote:
> This site is locked down like no other I have ever seen. You cannot
> bring anything into the site - no computers, no media, no phone. You
> ...
> This is my client's client, and even if I could circumvent their
> policy I would not do that. They have a zero tolerance policy and if
> ...

OK, no internet for real. :) Sorry I kept pushing this.  I made an
unflattering assumption that maybe it just hadn't occurred to you how
to get files in or out.  Sometimes there are "soft" barriers to
bringing files in or out: they don't want it to be trivial, but want
it to be doable if necessary.  But then there are times when they
really mean it.  I thought maybe the former applied to you, but
clearly it's the latter.  Apologies.

> These are all good debugging techniques, and I have tried some of
> them, but I think the issue is load related. There are 50 external
> machines ftp-ing to the C7 server, 24/7, thousands of files a day. And
> on the C6 client the script that processes them is running
> continuously. It will sometimes run for 7 hours then hang, but it has
> run for as long as 3 days before hanging. I have never been able to
> reproduce the errors/hanging situation manually.

If it truly is load related, I'd think you'd see something askew in
the sar logs.  But if the load tends to spike, rather than be
continuous, the sar sampling rate may be too coarse to pick it up.

> And again, this is only at this site. We have the same software
> deployed at 10 different sites all doing the same thing, and it all
> works fine at all of those.

Flaky hardware can also cause weird intermittent issues.  I know you
mentioned before your hardware is fairly new/decent spec; but that
doesn't make it immune to manufacturing defects.  For example, imagine
one voltage regulator that's ever-so-slightly out of spec.  It
happens.  Bad memory is not uncommon and certainly causes all kinds of
mysterious issues (though in my experience that tends to result in
spontaneous reboots or hard lockups, but truly anything could happen).

Ideally, you could take the system offline and run hardware
diagnostics, but I suspect that's impossible given your restrictions
on taking things in/out of the datacenter.

On Thu, Oct 27, 2016 at 3:05 AM, Larry Martell  wrote:
> Well I spoke too soon. The importer (the one that was initially
> hanging that I came here to fix) hung up after running 20 hours. There
> were no NFS errors or messages on neither the client nor the server.
> When I restarted it, it hung after 1 minute, Restarted it again and it
> hung after 20 seconds. After that when I restarted it it hung
> immediately. Still no NFS errors or messages. I tried running the
> process on the server and it worked fine. So I have to believe this is
> related to nobarrier. Tomorrow I will try removing that setting, but I
> am no closer to solving this and I have to leave Japan Saturday :-(
>
> The bad disk still has not been replaced - that is supposed to happen
> tomorrow, but I won't have enough time after that to draw any
> conclusions.

I've seen behavior like that with disks that are on their way out...
basically the system wants to read a block of data, and the disk
doesn't read it successfully, so it keeps trying.  The kind of disk,
what kind of controller it's behind, raid level, and various other
settings can all impact this phenomenon, and also how much detail you
can see about it.  You already know you have one bad disk, so that's
kind of an open wound that may or may not be contributing to your
bigger, unsolved problem.

So that makes me think, you can also do some basic disk benchmarking.
iozone and bonnie++ are nice, but I'm guessing they're not installed
and you don't have a means to install them.  But you can use "dd" to
do some basic benchmarking, and that's all but guaranteed to be
installed.  Similar to network benchmarking, you can do something
like:
time dd if=/dev/zero of=/tmp/testfile.dat bs=1G count=256

That will generate a 256 GB file.  Adjust "bs" and "count" to whatever
makes sense.  General rule of thumb is you want the target file to be
at least 2x the amount of RAM in the system to avoid cache effects
from skewing your results.  Bigger is even better if you have the
space, as it increases the odds of hitting the "bad" part of the disk
(if indeed that's the source of your problem).

Do that on C6, C7, and if you can a similar machine as a "control"
box, it would be ideal.  Again, we're looking for outliers, hang-ups,
timeouts, etc.

+1 to Gordon's suggestion to sanity check MTU sizes.

Another random possibility... By somewhat funny coincidence, we have
some servers in Japan as well, and were recently banging our heads
against the wall with some weird networking issues.  The remote hands
we had helping us (none of our staff was on site) claimed one or more
fiber cables were dusty, enough that it was affecting light levels.
They cleaned the cables and the proble

Re: [CentOS] [OT] How to recover data from an IDE drive

2016-10-27 Thread John R Pierce

On 10/27/2016 11:20 AM, Fred Smith wrote:

I got one of those from, er, either amazon or newegg a few years ago,
and while it works for a PATA drive, no matter what I did it wouldn't
work with an optical drive. despite the customer support people insisting
it does work. following their configuration settings didn't help.


likely because optical IDE drives use a completely different command set 
known as ATAPI, which is scsi based.   the adapter I linked is strictly 
for hard drives



--
john r pierce, recycling bits in santa cruz

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [OT] How to recover data from an IDE drive

2016-10-27 Thread Fred Smith
On Thu, Oct 27, 2016 at 09:19:23AM -0500, Leroy Tennison wrote:
> While IDE-to-USB is probably the easier option to use, I got an IDE-to-Sata 
> adapter on eBay for almost nothing (of course, you have to wait for it to 
> arrive directly from China).  If you go this route, the thing I learned from 
> the experience was to set the IDE drive to master (there won't be a slave 
> unless you get a one-to-two converter - I didn't see one of the latter).  
> Also, unless the converter goes both ways, pay attention to which is the 
> controller side and which is the drive side.

I got one of those from, er, either amazon or newegg a few years ago,
and while it works for a PATA drive, no matter what I did it wouldn't
work with an optical drive. despite the customer support people insisting
it does work. following their configuration settings didn't help.

So, YMMV.

> 
> - Original Message -
> From: "Digimer" 
> To: "CentOS mailing list" 
> Sent: Wednesday, October 26, 2016 8:10:05 PM
> Subject: Re: [CentOS] [OT] How to recover data from an IDE drive
> 
> On 26/10/16 09:01 PM, TE Dukes wrote:
> > Hello,
> > 
> > As some may recall, I suffered a hardware failure of a 10 yr old IBM
> > Netvista back in January. I was backing up my personal data, 'My Documents',
> > to my CentOS server but I apparently didn't get my emails.
> > 
> > It was a main board failure and I believe the data is still good on the hard
> > drive. Only problem, its an IDE drive and my server and new PC have SATA
> > drives.
> > 
> > Is it possible to install the old drive as a secondary drive into a newer PC
> > with SATA drives? If so, how do I do this? I need to access the emails.
> > 
> > This was a Windows XP machine using Outlook as the mail client.
> > 
> > TIA!!
> 
> There are plenty of IDE to USB adapters out there, so one of those is
> probably best. Here's what amazon has when searching for 'ide to usb':
> 
> https://www.amazon.ca/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=ide+to+usb
> 
> Most should work fine in Linux, but if you narrow down a specific
> make/model, a quick google search should confirm linux support.
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos

-- 
---
Under no circumstances will I ever purchase anything offered to me as
the result of an unsolicited e-mail message. Nor will I forward chain
letters, petitions, mass mailings, or virus warnings to large numbers
of others. This is my contribution to the survival of the online
community.
 --Roger Ebert, December, 1996
- The Boulder Pledge -
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Re: Disk near failure

2016-10-27 Thread Yamaban

On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote:

Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:

 On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:
>  === START OF READ SMART DATA SECTION ===
>  SMART Error Log not supported

 I reckon there's a  between those lines. The line right after the
 first should read something like:

 SMART overall-health self-assessment test result: PASSED

 or "FAILED" for that matter. If not try running

 smartctl -t short /dev/sda

 , wait for the indicated time to expire, then check the output of
 smartctl -a (or -x) again.

 Regards,
 Leonard.


Hi Leonard,
after a smart short test, the output of smartctl -a /dev/... is

=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: Corsair Force GT
Serial Number:1229794815020A81
LU WWN Device Id: 0 00 0
Firmware Version: 5.02
User Capacity:120,034,123,776 bytes [120 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Thu Oct 27 11:22:22 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: 
Disabled.
Self-test execution status:  (   0) The previous self-test routine 
completed

   without error or no self-test has ever
   been run.
Total time to complete Offline
data collection:(0) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
   Auto Offline data collection on/off 
support.

Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
   General Purpose Logging supported.
Short self-test routine
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  48) minutes.
Conveyance self-test routine
recommended polling time:(   2) minutes.
SCT capabilities:  (0x0021) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED 
WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate 0x000f   120   120   050Pre-fail Always  -  0/0
 5 Retired_Block_Count 0x0033   100   100   003Pre-fail Always  -  0
 9 Power_On_Hours_and_Msec 0x0032   000   000   000Old_age  Always  -  
17394h+07m+56.840s
12 Power_Cycle_Count   0x0032   099   099   000Old_age  Always  -  1974
171 Program_Fail_Count 0x0032   000   000   000Old_age  Always  -  0
172 Erase_Fail_Count   0x0032   000   000   000Old_age  Always  -  0
174 Unexpect_Power_Loss_Ct 0x0030   000   000   000Old_age  Offline -  780
177 Wear_Range_Delta   0x   000   000   000Old_age  Offline -  3
181 Program_Fail_Count 0x0032   000   000   000Old_age  Always  -  0
182 Erase_Fail_Count   0x0032   000   000   000Old_age  Always  -  0
187 Reported_Uncorrect 0x0032   100   100   000Old_age  Always  -  0
194 Temperature_Celsius0x0022   029   042   000Old_age  Always  -  29 
(Min/Max 15/42)
195 ECC_Uncorr_Error_Count 0x001c   100   100   000Old_age  Offline -  0/0
196 Reallocated_Event_Ct   0x0033   100   100   003Pre-fail Always  -  0
201 Unc_Soft_Read_Err_Rate 0x001c   100   100   000Old_age  Offline -  0/0
204 Soft_ECC_Correct_Rate  0x001c   100   100   000Old_age  Offline -  0/0
230 Life_Curve_Status  0x0013   100   100   000Pre-fail Always  -  100
231 SSD_Life_Left  0x0013   100   100   010Pre-fail Always  -  0
233 SandForce_Internal 0x   000   000   000Old_age  Offline -  6599
234 SandForce_Internal 0x00

Re: [CentOS] NFS help

2016-10-27 Thread Gordon Messmer

On 10/26/2016 09:54 PM, Larry Martell wrote:

And on the C6 client there is a similar blocked message for the ftp
job, blocked on nfs_flush, then the bad sequence number message I had
seen before, and at that point the ftp_job hung.



Are any of these systems using jumbo frames?  Check the MTU in the 
output of "ip link show" on every system, server and client. If any 
device doesn't match the MTU of all of the others, that might cause the 
problem you're describing.  And if they all match, but they're larger 
than 1500, a switch that doesn't support jumbo frames would also cause 
the problem you're describing.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Disk near failure

2016-10-27 Thread Steve Clark

On 10/27/2016 09:43 AM, Alessandro Baggi wrote:

Il 27/10/2016 13:58, Leonard den Ottolander ha scritto:

Hi,

On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

That's the line you are looking for. Since your disk apparently does not
store an error log - not sure if that's something with SSDs in general
or just with this particular disk - you will always have to invoke

smartctl -t short /dev/sda

and then after the test has completed check the output of

smartctl -a /dev/sda

for that particular line. Shouldn't be too hard to put in a cron job,
just make sure the job waits long enough (more than 1 minute, make it 2
to be sure) with reading the output of smartctl -a after invoking
smartctl -t short.

Regards,
Leonard.



You can also use the service smartd and edit the smartd.conf file and it have 
it send you emails when a disk starts to fail.



thank you for suggestion.

Alessandro.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos




--
Stephen Clark
*NetWolves Managed Services, LLC.*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.cl...@netwolves.com
http://www.netwolves.com
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [OT] How to recover data from an IDE drive

2016-10-27 Thread Leroy Tennison
While IDE-to-USB is probably the easier option to use, I got an IDE-to-Sata 
adapter on eBay for almost nothing (of course, you have to wait for it to 
arrive directly from China).  If you go this route, the thing I learned from 
the experience was to set the IDE drive to master (there won't be a slave 
unless you get a one-to-two converter - I didn't see one of the latter).  Also, 
unless the converter goes both ways, pay attention to which is the controller 
side and which is the drive side.

- Original Message -
From: "Digimer" 
To: "CentOS mailing list" 
Sent: Wednesday, October 26, 2016 8:10:05 PM
Subject: Re: [CentOS] [OT] How to recover data from an IDE drive

On 26/10/16 09:01 PM, TE Dukes wrote:
> Hello,
> 
> As some may recall, I suffered a hardware failure of a 10 yr old IBM
> Netvista back in January. I was backing up my personal data, 'My Documents',
> to my CentOS server but I apparently didn't get my emails.
> 
> It was a main board failure and I believe the data is still good on the hard
> drive. Only problem, its an IDE drive and my server and new PC have SATA
> drives.
> 
> Is it possible to install the old drive as a secondary drive into a newer PC
> with SATA drives? If so, how do I do this? I need to access the emails.
> 
> This was a Windows XP machine using Outlook as the mail client.
> 
> TIA!!

There are plenty of IDE to USB adapters out there, so one of those is
probably best. Here's what amazon has when searching for 'ide to usb':

https://www.amazon.ca/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=ide+to+usb

Most should work fine in Linux, but if you narrow down a specific
make/model, a quick google search should confirm linux support.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Disk near failure

2016-10-27 Thread Alessandro Baggi

Il 27/10/2016 13:58, Leonard den Ottolander ha scritto:

Hi,

On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


That's the line you are looking for. Since your disk apparently does not
store an error log - not sure if that's something with SSDs in general
or just with this particular disk - you will always have to invoke

smartctl -t short /dev/sda

and then after the test has completed check the output of

smartctl -a /dev/sda

for that particular line. Shouldn't be too hard to put in a cron job,
just make sure the job waits long enough (more than 1 minute, make it 2
to be sure) with reading the output of smartctl -a after invoking
smartctl -t short.

Regards,
Leonard.


thank you for suggestion.

Alessandro.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] python script from crontab - problems with proper execution

2016-10-27 Thread Brian Bernard
Hi Rafal,

You'll want to change the command to

/usr/bin/python /path/script_repo_scanner.py --bb_user bb_user
--bb_pass bb_pass --bd_log_dir /path/logs >>
/path/script_repo_scanner.py.log

Notice that &> is changed to >>

Take care,

Brian Bernard

On Thu, Oct 27, 2016 at 5:47 AM, Rafał Radecki 
wrote:

> Hi All.
>
> I currently have a problem with proper invocation of a python script with
> cron.
>
> non-root $ crontab -l
> #Ansible: script_repo_scanner
> 55 11 * * * /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user
> --bb_pass bb_pass --bd_log_dir /path/logs &>
> /path/script_repo_scanner.py.log
>
> And in /var/log/cron I see that cron executed the script but there is no
> log output in /path/script_repo_scanner.py.log and the script did not
> perform his job. So it looks like it has not been run despite entries in
> /var/log/cron ;)
>
> When I execute the command
>
> non-root$ /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user
> --bb_pass bb_pass --bd_log_dir /path/logs &>
> /path/script_repo_scanner.py.log
>
> I get standard output (script logs to stdout) and script does its job.
>
> Any clue what I could be missing?
>
> BR,
> Rafal.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Disk near failure

2016-10-27 Thread Leonard den Ottolander
Hi,

On Thu, 2016-10-27 at 11:25 +0200, Alessandro Baggi wrote:
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED

That's the line you are looking for. Since your disk apparently does not
store an error log - not sure if that's something with SSDs in general
or just with this particular disk - you will always have to invoke

smartctl -t short /dev/sda

and then after the test has completed check the output of

smartctl -a /dev/sda

for that particular line. Shouldn't be too hard to put in a cron job,
just make sure the job waits long enough (more than 1 minute, make it 2
to be sure) with reading the output of smartctl -a after invoking
smartctl -t short.

Regards,
Leonard.

-- 
mount -t life -o ro /dev/dna /genetic/research


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] python script from crontab - problems with proper execution

2016-10-27 Thread Rafał Radecki
Hi All.

I currently have a problem with proper invocation of a python script with
cron.

non-root $ crontab -l
#Ansible: script_repo_scanner
55 11 * * * /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user
--bb_pass bb_pass --bd_log_dir /path/logs &>
/path/script_repo_scanner.py.log

And in /var/log/cron I see that cron executed the script but there is no
log output in /path/script_repo_scanner.py.log and the script did not
perform his job. So it looks like it has not been run despite entries in
/var/log/cron ;)

When I execute the command

non-root$ /usr/bin/python /path/script_repo_scanner.py --bb_user bb_user
--bb_pass bb_pass --bd_log_dir /path/logs &>
/path/script_repo_scanner.py.log

I get standard output (script logs to stdout) and script does its job.

Any clue what I could be missing?

BR,
Rafal.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Disk near failure

2016-10-27 Thread Alessandro Baggi

Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:

Hi,

On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:

=== START OF READ SMART DATA SECTION ===
SMART Error Log not supported


I reckon there's a  between those lines. The line right after the
first should read something like:

SMART overall-health self-assessment test result: PASSED

or "FAILED" for that matter. If not try running

smartctl -t short /dev/sda

, wait for the indicated time to expire, then check the output of
smartctl -a (or -x) again.

Regards,
Leonard.


Hi Leonard,
after a smart short test, the output of smartctl -a /dev/... is

=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: Corsair Force GT
Serial Number:1229794815020A81
LU WWN Device Id: 0 00 0
Firmware Version: 5.02
User Capacity:120,034,123,776 bytes [120 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Thu Oct 27 11:22:22 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: 
Disabled.
Self-test execution status:  (   0) The previous self-test routine 
completed
without error or no self-test 
has ever

been run.
Total time to complete Offline
data collection:(0) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection 
on/off support.

Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  48) minutes.
Conveyance self-test routine
recommended polling time:(   2) minutes.
SCT capabilities:  (0x0021) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   120   120   050Pre-fail 
Always   -   0/0
  5 Retired_Block_Count 0x0033   100   100   003Pre-fail 
Always   -   0
  9 Power_On_Hours_and_Msec 0x0032   000   000   000Old_age 
Always   -   17394h+07m+56.840s
 12 Power_Cycle_Count   0x0032   099   099   000Old_age 
Always   -   1974
171 Program_Fail_Count  0x0032   000   000   000Old_age   Always 
  -   0
172 Erase_Fail_Count0x0032   000   000   000Old_age   Always 
  -   0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000Old_age 
Offline  -   780
177 Wear_Range_Delta0x   000   000   000Old_age 
Offline  -   3
181 Program_Fail_Count  0x0032   000   000   000Old_age   Always 
  -   0
182 Erase_Fail_Count0x0032   000   000   000Old_age   Always 
  -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always 
  -   0
194 Temperature_Celsius 0x0022   029   042   000Old_age   Always 
  -   29 (Min/Max 15/42)
195 ECC_Uncorr_Error_Count  0x001c   100   100   000Old_age 
Offline  -   0/0
196 Reallocated_Event_Count 0x0033   100   100   003Pre-fail  Always 
  -   0
201 Unc_Soft_Read_Err_Rate  0x001c   100   100   000Old_age 
Offline  -   0/0
204 Soft_ECC_Correct_Rate   0x001c   100   100   000Old_age 
Offline  -   0/0
230 Life_Curve_Status   0x0013   100   100   000Pre-fail  Always 
  -   100
231 SSD_Life_Left   0x0013   100 

Re: [CentOS] NFS help

2016-10-27 Thread Larry Martell
On Thu, Oct 27, 2016 at 1:03 AM, Larry Martell  wrote:
> On Wed, Oct 26, 2016 at 9:35 AM, Matt Garman  wrote:
>> On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell  
>> wrote:
>>> Again, no machine on the internal network that my 2 CentOS hosts are
>>> on are connected to the internet. I have no way to download anything.,
>>> There is an onerous and protracted process to get files into the
>>> internal network and I will see if I can get netperf in.
>>
>> Right, but do you have physical access to those machines?  Do you have
>> physical access to the machine which on which you use PuTTY to connect
>> to those machines?  If yes to either question, then you can use
>> another system (that does have Internet access) to download the files
>> you want, put them on a USB drive (or burn to a CD, etc), and bring
>> the USB/CD to the C6/C7/PuTTY machines.
>
> This site is locked down like no other I have ever seen. You cannot
> bring anything into the site - no computers, no media, no phone. You
> have to empty your pockets and go through an airport type naked body
> scan.
>
>> There's almost always a technical way to get files on to (or out of) a
>> system.  :)  Now, your company might have *policies* that forbid
>> skirting around the technical measures that are in place.
>
> This is my client's client, and even if I could circumvent their
> policy I would not do that. They have a zero tolerance policy and if
> you are caught violating it you are banned for life from the company.
> And that would not make my client happy.
>
>> Here's another way you might be able to test network connectivity
>> between C6 and C7 without installing new tools: see if both machines
>> have "nc" (netcat) installed.  I've seen this tool referred to as "the
>> swiss army knife of network testing tools", and that is indeed an apt
>> description.  So if you have that installed, you can hit up the web
>> for various examples of its use.  It's designed to be easily scripted,
>> so you can write your own tests, and in theory implement something
>> similar to netperf.
>>
>> OK, I just thought of another "poor man's" way to at least do some
>> sanity testing between C6 and C7: scp.  First generate a huge file.
>> General rule of thumb is at least 2x the amount of RAM in the C7 host.
>> You could create a tarball of /usr, for example (e.g. "tar czvf
>> /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough
>> to hold this).  Then, first do this: "time scp /tmp/bigfile.tar.gz
>> localhost:/tmp/bigfile_copy.tar.gz".  This will literally make a copy
>> of that big file, but will route through most of of the network stack.
>> Make a note of how long it took.  And also be sure your /tmp partition
>> is big enough for two copies of that big file.
>>
>> Now, repeat that, but instead of copying to localhost, copy to the C6
>> box.  Something like: "time scp /tmp/bigfile.tar.gz > host>:/tmp/".  Does the time reported differ greatly from when you
>> copied to localhost?  I would expect them to be reasonably close.
>> (And this is another reason why you want a fairly large file, so the
>> transfer time is dominated by actual file transfer, rather than the
>> overhead.)
>>
>> Lastly, do the reverse test: log in to the C6 box, and copy the file
>> back to C7, e.g. "time scp /tmp/bigfile.tar.gz > host>:/tmp/bigfile_copy2.tar.gz".  Again, the time should be
>> approximately the same for all three transfers.  If either or both of
>> the latter two copies take dramatically longer than the first, then
>> there's a good chance something is askew with the network config
>> between C6 and C7.
>>
>> Oh... all this time I've been jumping to fancy tests.  Have you tried
>> the simplest form of testing, that is, doing by hand what your scripts
>> do automatically?  In other words, simply try copying files between C6
>> and C7 using the existing NFS config?  Can you manually trigger the
>> errors/timeouts you initially posted?  Is it when copying lots of
>> small files?  Or when you copy a single huge file?  Any kind of file
>> copying "profile" you can determine that consistently triggers the
>> error?  That could be another clue.
>
> These are all good debugging techniques, and I have tried some of
> them, but I think the issue is load related. There are 50 external
> machines ftp-ing to the C7 server, 24/7, thousands of files a day. And
> on the C6 client the script that processes them is running
> continuously. It will sometimes run for 7 hours then hang, but it has
> run for as long as 3 days before hanging. I have never been able to
> reproduce the errors/hanging situation manually.
>
> And again, this is only at this site. We have the same software
> deployed at 10 different sites all doing the same thing, and it all
> works fine at all of those.

Well I spoke too soon. The importer (the one that was initially
hanging that I came here to fix) hung up after running 20 hours. There
were no NFS errors or messages on neither the client nor the server.
When