Re: [OmniOS-discuss] Anybody using a current SAMBA on omniosce?

2018-10-26 Thread Stephan Budach
Hi Andries,

- Ursprüngliche Mail - 

> Von: "Andries Annema" 
> An: "Stephan Budach" ,
> omnios-discuss@lists.omniti.com
> Gesendet: Montag, 22. Oktober 2018 19:32:57
> Betreff: Re: [OmniOS-discuss] Anybody using a current SAMBA on
> omniosce?

> Hi Stephan,

> Regarding your first question, not that I'm using Samba on OmniOSce
> myself, but did you give 'pkgsrc' (
> https://pkgsrc.joyent.com/install-on-illumos/ ) a try?
> If I hit it with this command ...:

> pkgin search samba

> ... it comes up with two mathing "SMB/CIFS protocol server
> suite"-packages:
> samba- 4.6.8nb9
> samba-3.6.25nb13

Thanks for looking that up. I will give them a try.

> Your second question, I have no clue.

Thanks again. However, I have learned that Samba has its own internal 
notification system, which will notify a Samba client of any file event it 
wants to listen to. Usually, the client sends such a request for the shares it 
is logging into to get notified as soon as something happens on that share.

However, this service is kind of omnipotent and cna be leveraged to extract 
file events for all the shares this instance of Samba hosts.

> Cheers,
> Andries

Cheers,
Stephan

> On 2018-10-22 11:26, Stephan Budach wrote:

> > Hi,
> 

> > I was wondering, if anyone is using a current version of SAMBA on
> > their omniosce? If yes, I suppose that there's no package for
> > anything later than 4.4, or is there?
> 

> > Also, does anybody know, if Samba makes use of the File Event
> > notification API?
> 

> > Cheers,
> 
> > Stephan
> 

> > ___
> 
> > OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Anybody using a current SAMBA on omniosce?

2018-10-22 Thread Stephan Budach
Hi, 


I was wondering, if anyone is using a current version of SAMBA on their 
omniosce? If yes, I suppose that there's no package for anything later than 
4.4, or is there? 


Also, does anybody know, if Samba makes use of the File Event notification API? 


Cheers, 
Stephan 




smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] How to safely remove/replace NVMe SSDs

2018-01-31 Thread Stephan Budach
Hi, 


I have purchased two if those Supermicro NVMe servers: SSG-2028R-NR48N. Both of 
them are equipped with 24x Intel P DC4500 U.2 devices, which are obviously 
hot-pluggable, at least they seem to be. ;) 


At the moment, I am trying to familiarize myself with the handling of those 
devices and I am having quite a hard time, coming up with a method of safely 
removing/replacing such a device. I am able to detach a nvme device using 
nvmeadm, but removing it from the system by pulling it out, causes the kernel 
to retire the pci device and I have not yet found a way to get the re-inserted 
device online again. 


Anybody having some experience how to handle those NVMe devices? 


Thanks, 
Stephan 


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] NVMe under omniOS CE

2018-01-12 Thread Stephan Budach
Shoot - please forgive my ignorance… uncommenting strict-version in nvme.conf 
sloved that. 

Cheers, 
Stephan 

- Ursprüngliche Mail -

> Von: "Stephan Budach" <stephan.bud...@jvm.de>
> An: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Freitag, 12. Januar 2018 11:17:09
> Betreff: [OmniOS-discuss] NVMe under omniOS CE

> Hi,

> I finally got the first of my two Supermicro 2028R-N48M NVME servers.
> I installed the latest omniOSce on it and as it seems, it doesn't
> recognize the NVMe drives.This box is equipped with 24x Intel DC
> P4500, PCIe 3.1 NVMe.

> Does anybody know, why those are not recognized?

> Thanks,
> Stephan

> --

> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] NVMe under omniOS CE

2018-01-12 Thread Stephan Budach
Hi, 


I finally got the first of my two Supermicro 2028R-N48M NVME servers. I 
installed the latest omniOSce on it and as it seems, it doesn't recognize the 
NVMe drives.This box is equipped with 24x Intel DC P4500, PCIe 3.1 NVMe. 


Does anybody know, why those are not recognized? 


Thanks, 
Stephan 

-- 


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Investing into the Future of OmniOS Community Edition

2017-11-27 Thread Stephan Budach
+1

Cheers,
Stephan

- Ursprüngliche Mail -
> Von: "Guenther Alka" 
> An: "omnios-discuss" 
> Gesendet: Montag, 27. November 2017 10:15:47
> Betreff: Re: [OmniOS-discuss] Investing into the Future of OmniOS Community   
> Edition
> 
> hello Tobias
> 
> A pure donation modell (pay for something that is available for free)
> is
> only ok for small amounts from private persons.
> 
> Can this be extended by
> - a function to request a quotation with a given amount together with
> a
> (pro forma) invoice online what makes institutional/edu payments
> possible
> (to accept this quotation, pay the amount until..)
> 
> - with some sort of an extra offer over the free offerings
> maybe a Pro subscription to beta/ bloody releases, a knowledgebase or
> a
> plus repository
> 
> Gea
> 
> 
> Am 26.11.2017 um 23:31 schrieb Tobias Oetiker:
> > Hi All
> >
> > tl;tr? head over to https://omniosce.org/patron and support your
> > favorite OS with a regular donation.
> >
> > Since the OmniOS Community Edition Association has taken over
> > maintenance and care of OmniOSce, things have been moving at a
> > brisk pace.
> >
> >* Upstream changes from Illumos and Joyent are integrated weekly
> >into our github repo.
> >
> >* Security fixes are normally available within hours.
> >
> >* We have committed to an elaborate long term release plan.
> >
> >* In early November we have released OmniOS r24 on schedule with
> >many enhancements.
> >
> >* Our new website has a lot of new content on feeding and care
> >of your OmniOS instance.
> >
> > Judging from the 566 systems that have been switched over to our
> > new repos, downloading regular updates, we think there is ample
> > interest in OmniOS. Not even taking into account all those who
> > have not yet upgraded.
> >
> > With the release of OmniOSce r24 we have created the OmniOS Patron
> > Page where you can set up regular contribution or a one time
> > donation to the project.
> >
> > Unfortunately only very few people have yet made the step of
> > actually pledging funds. According to the straw poll conducted in
> > spring, there should be at least 80k USD per year available to
> > those who maintain and release updates for  OmniOS.
> >
> > At the moment we are getting about 25 USD per month from 4
> > individuals. This is pretty bleak and does not even begin to cover
> > the cost of running the show and certainly does not allow us to
> > plan for the future.
> >
> > Head over to https://omniosce.org/patron and let us know how much
> > you value OmniOS and our work. If everybody spent just 20 dollar
> > per system and month things would be looking much brighter
> > already.
> >
> > To put this into perspective: a single O365 Enterprise license
> > costs 35 USD/month. Most individuals pay a similar amount every
> > month for the internet connection or mobile phone contract.
> >
> > Andy Fiddaman
> > Dominik Hassler
> > Tobi Oetiker
> >
> > --
> > OmniOS Community Edition Association
> > Aarweg 17, 4600 Olten, Switzerland
> > www.omniosce.org
> > i...@omniosce.org
> 
> --
> H  f   G
> Hochschule für Gestaltung
> university of design
> 
> Schwäbisch Gmünd
> Rektor-Klaus Str. 100
> 73525 Schwäbisch Gmünd
> 
> Guenther Alka, Dipl.-Ing. (FH)
> Leiter des Rechenzentrums
> head of computer center
> 
> Tel 07171 602 627
> Fax 07171 69259
> guenther.a...@hfg-gmuend.de
> http://rz.hfg-gmuend.de
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best way to share files with Mac?

2017-11-26 Thread Stephan Budach
Hi,

- Ursprüngliche Mail -
> Von: "Geoff Nordli" 
> An: "Chris Ferebee" 
> CC: "omnios-discuss" 
> Gesendet: Sonntag, 26. November 2017 07:54:20
> Betreff: Re: [OmniOS-discuss] Best way to share files with Mac?
> 
> Hi Chris.
> 
> I wonder if now this is fixed with the vfs_fruit module in Samba.
> 
> http://plazko.io/apple-osx-finder-is-listing-files-very-very-slow-when-connected-to-smb-shared-hard-drive-connected-to-a-wifi-router/
> 
> thanks,
> 
> Geoff
> 

the vfs_fruit extension, afair, provides interlocking capabilities between 
Netatalk/AFP and Samba 3/4. It allows Macs and PCs to share the same data 
through both protocols.

Looking at the man page of it, it may be doing some more stuff, which may come 
in handy for macOS clients. However, this module will not work with the 
kernel/smb and you will have to go with a full-fledged Samba 4 installation.

I know Ralph Böhme, the author of vfs_fruit, and I have contacted serNet (the 
company he's working for) and we be having a conversation about what the odds 
arem that they provide a package for omniOS as well.

Cheers,
Stephan
  
> 
> 
> On 2017-11-25 06:45 AM, Chris Ferebee wrote:
> > Hi Geoff,
> >
> > I love the way you put this:
> >
> >> They are reporting problems with the speed when using the
> >> "finder".
> > Oh, yes. I spent months of my life in 2014 fighting the Finder vs.
> > a major installation of netatalk, which I deployed on SmartOS.
> >
> > We have a large server (dual 6-core XEON, 128 GB RAM, 22 x 4 TB
> > SAS, STEC ZeusRAM) running as a fileserver for Macs. The client
> > does motion graphics and has a very large number of files, 100+
> > million on the largest share. (Not more than a few thousand files
> > per directory, though.)
> >
> > What we saw was that opening the Finder to a directory on the AFP
> > share would show a blank window for 20–180 seconds before
> > displaying the contents, even if there were maybe just 10–20
> > subdirectories to display.
> >
> > Then, as long as the Finder window was open, it would peg one CPU
> > core on the server at 100%, even though the display did not
> > change.
> >
> > It turned out that the Finder was looping continually through files
> > several subdirectories deep, querying extended attributes. Due to
> > the way netatalk is coded, this requires a call to getcwd() each
> > time, which is very expensive on Solaris. (getcwd() = get the path
> > name of the current working directory.)
> >
> > A SmartOS engineer was kind enough to advise me on this, and
> > basically told me: if you have getcwd() in your hot path, you are
> > screwed. This is probably a reason why the problem, though
> > actually a netatalk issue, is less pronounced on Linux, where
> > getcwd() is not as expensive as on Solaris.
> >
> > I never did get to the bottom of this. It appears that it doesn’t
> > happen when the Finder is talking to Apple’s own AFP server,
> > perhaps because there the Finder is able to use FSEvents to track
> > changes. Unfortunately, it appears that netatalk has never fully
> > addressed this, at least I haven’t seen anything indicating that
> > it has gained FSEvents supports in the past years. Here is what
> > netatalk dev Ralph Böhme had to say:
> >
> > 
> >
> > In the end, I switched the client to a (commercial, expensive) AFP
> > server, HELIOS EtherShare, and all problems went away, performance
> > as expected, and I didn’t investigate further.
> >
> > If you are dealing with Adobe applications, be aware that there are
> > a number of reasons why they do not work well on a Mac over SMB.
> > We didn’t get past a hard maximum path length of 254 characters,
> > but there are others.
> >
> > I’m happy to discuss this further, and would be quite interested to
> > find a less-expensive solution for large-scale file service for
> > Macs. We can take it off-list if it goes any further off topic.
> >
> > Good luck,
> > Chris
> >
> >
> >> Am 24.11.2017 um 03:56 schrieb Geoff Nordli :
> >>
> >> Hi.
> >>
> >> I have to support a few mac machines connecting to a few file
> >> shares running Omnios.
> >>
> >> They are reporting problems with the speed when using the
> >> "finder".
> >>
> >> I see there is a netatalk package I can download and compile off
> >> of sourceforge, which looks fairly current.
> >>
> >> Any suggestions on the best way forward to support Mac machines?
> >>
> >> thanks,
> >>
> >> Geoff
> >>
> >>
> 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best way to share files with Mac?

2017-11-24 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "David Ledger" <david.led...@ivdcs.co.uk>
> An: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Freitag, 24. November 2017 19:26:44
> Betreff: Re: [OmniOS-discuss] Best way to share files with Mac?
> 
> On 24 Nov 2017, at 17:41, Stephan Budach wrote:
> 
> > - Ursprüngliche Mail -
> >> Von: "Geoff Nordli" <geo...@gnaa.net>
> >> An: "Manuel Oetiker" <man...@oetiker.ch>
> >> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> >> Gesendet: Freitag, 24. November 2017 18:05:17
> >> Betreff: Re: [OmniOS-discuss] Best way to share files with Mac?
> >>
> >> What made you choose Samba vs the built-in SMB server?
> >>
> >> Active Directory domain controller?
> >>
> >
> > Yeah, what was it? I already reveived word, that I can use
> > AD/Kerberos
> > to authenticate my Macs to Kernel/SMB.
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> Maybe getting off topic, but what do people use to update Mac user
> passwords and keychain passwords when they are changed in AD from an
> app
> on OmniOS?
> 
> David

Tried NoMAD?

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best way to share files with Mac?

2017-11-24 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Geoff Nordli" 
> An: "Manuel Oetiker" 
> CC: "omnios-discuss" 
> Gesendet: Freitag, 24. November 2017 18:05:17
> Betreff: Re: [OmniOS-discuss] Best way to share files with Mac?
> 
> What made you choose Samba vs the built-in SMB server?
> 
> Active Directory domain controller?
>
 
Yeah, what was it? I already reveived word, that I can use AD/Kerberos to 
authenticate my Macs to Kernel/SMB.


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best way to share files with Mac?

2017-11-24 Thread Stephan Budach
Hi Geoff,

- Ursprüngliche Mail -
> Von: "Geoff Nordli" <geo...@gnaa.net>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Freitag, 24. November 2017 18:06:52
> Betreff: Re: [OmniOS-discuss] Best way to share files with Mac?
> 
> 
> 
> On 2017-11-23 10:57 PM, Stephan Budach wrote:
> > Hi Geoff,
> >
> > - Ursprüngliche Mail -
> >> Von: "Geoff Nordli" <geo...@gnaa.net>
> >> An: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> >> Gesendet: Freitag, 24. November 2017 03:56:01
> >> Betreff: [OmniOS-discuss] Best way to share files with Mac?
> >>
> >> Hi.
> >>
> >> I have to support a few mac machines connecting to a few file
> >> shares
> >> running Omnios.
> >>
> >> They are reporting problems with the speed when using the
> >> "finder".
> >>
> >> I see there is a netatalk package I can download and compile off
> >> of
> >> sourceforge, which looks fairly current.
> >>
> >> Any suggestions on the best way forward to support Mac machines?
> >>
> >> thanks,
> >>
> >> Geoff
> >>
> > well, the current Netatalk 3.1.x packages will of course get you
> > Mac connectivity, but you should probably go with SMB instead, as
> > no one actually knows, when Apple will pull AFP from macOS. I know
> > that AFP/Netatalk is very fast - we have been using it on
> > Solaris11 for a long time now, but it is in no way future-proof.
> >
> > I think it would make more sense to fix the speed issues, you're
> > seeing when using SMB. What speed issues are you experiencing?
> >
> > Cheers,
> > Stephan
> 
> Hello Stephan.
> 
> I am not 100% sure exactly what the issues are with the finder.  I
> wanted to make sure I was focusing on the right protocol first.
> 
> Manuel suggests going with Samba, which I have been looking at for a
> couple of years, but nothing really made me want to put the effort
> into it.
> 
> thanks!!
> Geoff

this is, what I would suggest as well. Actually, we just started an internal 
project to finally move from AFP to SMB. Although I know, that AFP beats the 
pants off of SMB, AFP will go away and it's better to switch asap and in your 
case, not get tainted by AFP's speed in the first place…

Better focus on what bothers you when using SMB and getting that sorted out.

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best way to share files with Mac?

2017-11-23 Thread Stephan Budach
Hi Geoff,

- Ursprüngliche Mail -
> Von: "Geoff Nordli" 
> An: "omnios-discuss" 
> Gesendet: Freitag, 24. November 2017 03:56:01
> Betreff: [OmniOS-discuss] Best way to share files with Mac?
> 
> Hi.
> 
> I have to support a few mac machines connecting to a few file shares
> running Omnios.
> 
> They are reporting problems with the speed when using the "finder".
> 
> I see there is a netatalk package I can download and compile off of
> sourceforge, which looks fairly current.
> 
> Any suggestions on the best way forward to support Mac machines?
> 
> thanks,
> 
> Geoff
> 

well, the current Netatalk 3.1.x packages will of course get you Mac 
connectivity, but you should probably go with SMB instead, as no one actually 
knows, when Apple will pull AFP from macOS. I know that AFP/Netatalk is very 
fast - we have been using it on Solaris11 for a long time now, but it is in no 
way future-proof.

I think it would make more sense to fix the speed issues, you're seeing when 
using SMB. What speed issues are you experiencing?

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] omniOS/Samba/Kerberos?

2017-11-22 Thread Stephan Budach
Hi, 


has anybody ever tried to authenticate a SMB client, e.g. a Mac, to a Samba 
server running on omniOS via Kerberos? My plan is to have our Macs get a ticket 
from our AD server and then have them logged in into our file servers via a 
valid Kerberos ticket. Would this work only using a full-fledged Samba4 server 
or would that also work using the kernel/smb? 


Cheers, 
Stephan 




smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 2TB vs 4TB NVMe drives?

2017-11-14 Thread Stephan Budach
Hi Bob,

- Ursprüngliche Mail -
> Von: "Bob Friesenhahn" <bfrie...@simple.dallas.tx.us>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Dienstag, 14. November 2017 15:44:33
> Betreff: Re: [OmniOS-discuss] 2TB vs 4TB NVMe drives?
> 
> On Tue, 14 Nov 2017, Stephan Budach wrote:
> 
> > we are planning on purchasing a Supermicro NVMe server with 48 .U2
> > slots. I intended to initially load it with 24 x 2TB DC P4500,
> > leaving 24 slots empty.
> >
> >
> > Now… I've been also offered an Intel chassis with only 24 slots and
> > thus the offer also included the 4TB P4500s. Just without thinking
> > very long, I instinctively wayed towards the 2TB drives, mainly for
> > the reason, that should a drive really fail, I'd have of course a
> > longer resilver at hand, usind 4TB NVMe drives.
> >
> >
> > Which ones would you choose?
> 
> Assuming that OmniOS works with these devices at all, from a power
> consumption, heat, complexity, and reliability standpoint, the larger
> devices appear to be a win (1/2 the power and 2X the MTBF for the
> same
> storage capacity).  Resilver time is important but NVMe drives do not
> have the rotational latency and seek time issues of rotating media so
> resilver time should not be such an issue and there should only be a
> factor of 2 difference in resilver time.
> 
> A consideration is what zfs pool configuration you would be putting
> on
> these drives.  For throughput, more devices and more vdevs is better.
> It sounds like you would initially (and perhaps forever) have the
> same
> number of devices.
> 
> Are you planning to use zfs mirrors, or raidz2/raidz3?  What about
> dedicated zfs intent log devices?  If synchronous writes are
> important
> to you, dedicated zfs intent log devices should still help with pool
> performance and long-term health by deferring writes to the vdevs so
> writes can be larger and more sequential.
> 

Afaik, all the hardware is in the Illumos HCL, so this config should run fine 
under omniOS.
This setup is intended to replace my current ZFS-HA storage pools and it will 
be configured with zfs mirrors only, where the mirror vdevs will be built on 
iSCSI LUNs from "raw" devices, as much as you can get raw devices served by 
COMSTAR.

So, we will have these boxes serving each NVMe as a LUN to the RSF-1 nodes, 
which then will host 2 zpools of 6 mirror vdevs each. From thsoe zpools, the 
RSF-1 nodes will serve NFS to our Oracle VM cluster servers. 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] 2TB vs 4TB NVMe drives?

2017-11-14 Thread Stephan Budach

Hi, 


we are planning on purchasing a Supermicro NVMe server with 48 .U2 slots. I 
intended to initially load it with 24 x 2TB DC P4500, leaving 24 slots empty. 


Now… I've been also offered an Intel chassis with only 24 slots and thus the 
offer also included the 4TB P4500s. Just without thinking very long, I 
instinctively wayed towards the 2TB drives, mainly for the reason, that should 
a drive really fail, I'd have of course a longer resilver at hand, usind 4TB 
NVMe drives. 


Which ones would you choose? 





Thanks, 
Stephan

smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS based redundant NFS

2017-09-30 Thread Stephan Budach
Hi Sergey,

- Ursprüngliche Mail -
> Von: "sergey ivanov" <serge...@gmail.com>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Freitag, 29. September 2017 22:30:31
> Betreff: Re: [OmniOS-discuss] OmniOS based redundant NFS
> 
> Thanks, Stephan.
> I did a simple test with creating lu over physical disks for use as
> ISSI targets, and it worked well. I am going to directly connect 2
> servers and export their disks as separate I SCSI targets. Or maybe
> different LUNs in a target. And then on the active server start
> initiator to get these targets, and combine them into a pool of 2-way
> mirrors so that it stays degraded but working if one of the servers
> dies.

Well, different targets mean, that you will be able to service single disks on 
one node, without having to degrade the whole zpool, but only the affected 
vdevs. On the other hand, there is more complexity since, you will have of 
course quite a big number of iSCSI targets to login to. This may be ok, if the 
number doesn't get too hight, but going with hundreds of disks, I chose to use 
fewer targets with more LUNs.

One thing to keep in mind is, that stmfad allows you to create the guuid to 
your liking. That is that you can freely choose the last 20 bytes to be 
anything you want. I used that to ascii-code the node name and slot into the 
guid, such as that it displays on my NFS heads, when running format. This helps 
a lot in mapping the LUNs to drives.

> So, manual failover for this configuration will be the following. If
> the server to be disabled is still active, stop NFS, export zpool on
> it, stop iscsiadm, release shared IP. On the other server: import
> zpool and start NFS, activate shared IP.

I am using the sharenfs properties of ZFS, but you will likely have to run 
zpool export -f  if you want to fail over the service, since the zpool 
is still busy. Also, you'd better set zpool failmode to panic instead of wait, 
such as that an issue triggers a reboot, rather than keeping you NFS head 
waiting.

> I read once there are some tricks which make clients do not recognize
> NFS server is changed underneath all mounts, but I never tried it.

The only issue I came across was, when I deliberatley failed over the NFS 
service forth and back within the a too short period, which causes the NFSd on 
the former primary node to re-use the tcp packets numbers, insisting on reusing 
it's old NFS connections to the clients. I solved that by resetting the NFSd 
each time a service starts on any NFS head. The currently connected NFS clients 
are not affected by that and this solved this particular issue for me.

Cheers,
Stephan

> --
>   Regards,
>   Sergey Ivanov.
> 
> Regards,
> Sergey Ivanov
> 
> 
> On Thu, Sep 28, 2017 at 12:49 AM, Stephan Budach
> <stephan.bud...@jvm.de> wrote:
> > Hi Sergey,
> >
> > - Ursprüngliche Mail -
> >> Von: "sergey ivanov" <serge...@gmail.com>
> >> An: "Stephan Budach" <stephan.bud...@jvm.de>
> >> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> >> Gesendet: Mittwoch, 27. September 2017 23:15:49
> >> Betreff: Re: [OmniOS-discuss] OmniOS based redundant NFS
> >>
> >> Thanks, Stephan!
> >>
> >> Please, explain "The reason to use two x two separate servers is,
> >> that
> >> the mirrored zpool's vdevs look the same on each NFS head".
> >>
> >> I understand that, if I want to have the same zpool based on iscsi
> >> devices, I should not mix local disks with iscsi target disks.
> >>
> >> But I think I can have 2 computers, each exporting a set of local
> >> disks as iscsi targets. And to have iscsi initiators on the same
> >> computers importing these targets to build zpools.
> >>
> >> Also, looking at sbdadm, I think I can 'create lu
> >> /dev/rdsk/c0t0d3s2'.
> >>
> >> Ok, I think I would better try it and report how it goes.
> >
> > Actually, things can become quite complex, I'd like to reduce the
> > "mental" involvement to the absolute minimum, mainly because we
> > often faced a situation where something would suddenly break,
> > which had been running for a long time without problems. This is
> > when peeple start… well maybe not panicking, but having to recap
> > what the current setup was like and what they had to do to tackle
> > this.
> >
> > So, uniformity is a great deal of help on such systems - at least
> > for us. Technically, there is no issue with mixing local and
> > remote i

Re: [OmniOS-discuss] write amplification zvol

2017-09-28 Thread Stephan Budach
- Ursprüngliche Mail - 

> Von: "anthony omnios" 
> An: "Richard Elling" 
> CC: omnios-discuss@lists.omniti.com
> Gesendet: Donnerstag, 28. September 2017 09:56:42
> Betreff: Re: [OmniOS-discuss] write amplification zvol

> Thanks Richard for your help.

> My problem is that i have a network ISCSI traffic of 2 MB/s, each 5
> seconds i need to write on disks 10 MB of network traffic but on
> pool filervm2 I am writing much more that, approximatively 60 MB
> each 5 seconds. Each ssd of filervm2 is writting 15 MB every 5
> second. When i check with smartmootools every ssd is writing
> approximatively 250 GB of data each day.

> How can i reduce amont of data writting on each ssd ? i have try to
> reduce block size of zvol but it change nothing.

> Anthony

> 2017-09-28 1:29 GMT+02:00 Richard Elling <
> richard.ell...@richardelling.com > :

> > Comment below...
> 

> > > On Sep 27, 2017, at 12:57 AM, anthony omnios <
> > > icoomn...@gmail.com
> > > > wrote:
> 
> > >
> 
> > > Hi,
> 
> > >
> 
> > > i have a problem, i used many ISCSI zvol (for each vm), network
> > > traffic is 2MB/s between kvm host and filer but i write on disks
> > > many more than that. I used a pool with separated mirror zil
> > > (intel s3710) and 8 ssd samsung 850 evo 1To
> 
> > >
> 
> > > zpool status
> 
> > > pool: filervm2
> 
> > > state: ONLINE
> 
> > > scan: resilvered 406G in 0h22m with 0 errors on Wed Sep 20
> > > 15:45:48
> > > 2017
> 
> > > config:
> 
> > >
> 
> > > NAME STATE READ WRITE CKSUM
> 
> > > filervm2 ONLINE 0 0 0
> 
> > > mirror-0 ONLINE 0 0 0
> 
> > > c7t5002538D41657AAFd0 ONLINE 0 0 0
> 
> > > c7t5002538D41F85C0Dd0 ONLINE 0 0 0
> 
> > > mirror-2 ONLINE 0 0 0
> 
> > > c7t5002538D41CC7105d0 ONLINE 0 0 0
> 
> > > c7t5002538D41CC7127d0 ONLINE 0 0 0
> 
> > > mirror-3 ONLINE 0 0 0
> 
> > > c7t5002538D41CD7F7Ed0 ONLINE 0 0 0
> 
> > > c7t5002538D41CD83FDd0 ONLINE 0 0 0
> 
> > > mirror-4 ONLINE 0 0 0
> 
> > > c7t5002538D41CD7F7Ad0 ONLINE 0 0 0
> 
> > > c7t5002538D41CD7F7Dd0 ONLINE 0 0 0
> 
> > > logs
> 
> > > mirror-1 ONLINE 0 0 0
> 
> > > c4t2d0 ONLINE 0 0 0
> 
> > > c4t4d0 ONLINE 0 0 0
> 
> > >
> 
> > > i used correct ashift of 13 for samsung 850 evo
> 
> > > zdb|grep ashift :
> 
> > >
> 
> > > ashift: 13
> 
> > > ashift: 13
> 
> > > ashift: 13
> 
> > > ashift: 13
> 
> > > ashift: 13
> 
> > >
> 
> > > But i write a lot on ssd every 5 seconds (many more than the
> > > network traffic of 2 MB/s)
> 
> > >
> 
> > > iostat -xn -d 1 :
> 
> > >
> 
> > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> 
> > > 11.0 3067.5 288.3 153457.4 6.8 0.5 2.2 0.2 5 14 filervm2
> 

> > filervm2 is seeing 3067 writes per second. This is the interface to
> > the upper layers.
> 
> > These writes are small.
> 

> > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rpool
> 
> > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0
> 
> > > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
> 
> > > 0.0 552.6 0.0 17284.0 0.0 0.1 0.0 0.2 0 8 c4t2d0
> 
> > > 0.0 552.6 0.0 17284.0 0.0 0.1 0.0 0.2 0 8 c4t4d0
> 

> > The log devices are seeing 552 writes per second and since
> > sync=standard that
> 
> > means that the upper layers are requesting syncs.
> 

> > > 1.0 233.3 48.1 10051.6 0.0 0.0 0.0 0.1 0 3 c7t5002538D41657AAFd0
> 
> > > 5.0 250.3 144.2 13207.3 0.0 0.0 0.0 0.1 0 3 c7t5002538D41CC7127d0
> 
> > > 2.0 254.3 24.0 13207.3 0.0 0.0 0.0 0.1 0 4 c7t5002538D41CC7105d0
> 
> > > 3.0 235.3 72.1 10051.6 0.0 0.0 0.0 0.1 0 3 c7t5002538D41F85C0Dd0
> 
> > > 0.0 228.3 0.0 16178.7 0.0 0.0 0.0 0.2 0 4 c7t5002538D41CD83FDd0
> 
> > > 0.0 225.3 0.0 16210.7 0.0 0.0 0.0 0.2 0 4 c7t5002538D41CD7F7Ed0
> 
> > > 0.0 282.3 0.0 19991.1 0.0 0.0 0.0 0.2 0 5 c7t5002538D41CD7F7Dd0
> 
> > > 0.0 280.3 0.0 19871.0 0.0 0.0 0.0 0.2 0 5 c7t5002538D41CD7F7Ad0
> 

> > The pool disks see 1989 writes per second total or 994 writes per
> > second logically.
> 

> > It seems to me that reducing 3067 requested writes to 994 logical
> > writes is the opposite
> 
> > of amplification. What do you expect?
> 
> > -- richard
> 

> > >
> 
> > > I used zvol of 64k, i try with 8k and problem is the same.
> 
> > >
> 
> > > zfs get all filervm2/hdd-110022a :
> 
> > >
> 
> > > NAME PROPERTY VALUE SOURCE
> 
> > > filervm2/hdd-110022a type volume -
> 
> > > filervm2/hdd-110022a creation Tue May 16 10:24 2017 -
> 
> > > filervm2/hdd-110022a used 5.26G -
> 
> > > filervm2/hdd-110022a available 2.90T -
> 
> > > filervm2/hdd-110022a referenced 5.24G -
> 
> > > filervm2/hdd-110022a compressratio 3.99x -
> 
> > > filervm2/hdd-110022a reservation none default
> 
> > > filervm2/hdd-110022a volsize 25G local
> 
> > > filervm2/hdd-110022a volblocksize 64K -
> 
> > > filervm2/hdd-110022a checksum on default
> 
> > > filervm2/hdd-110022a compression lz4 local
> 
> > > filervm2/hdd-110022a readonly off default
> 
> > > filervm2/hdd-110022a copies 1 default
> 
> > > filervm2/hdd-110022a refreservation none default
> 
> > > filervm2/hdd-110022a primarycache all default
> 

Re: [OmniOS-discuss] OmniOS based redundant NFS

2017-09-27 Thread Stephan Budach
Hi Sergey,

- Ursprüngliche Mail -
> Von: "sergey ivanov" <serge...@gmail.com>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Mittwoch, 27. September 2017 23:15:49
> Betreff: Re: [OmniOS-discuss] OmniOS based redundant NFS
> 
> Thanks, Stephan!
> 
> Please, explain "The reason to use two x two separate servers is,
> that
> the mirrored zpool's vdevs look the same on each NFS head".
> 
> I understand that, if I want to have the same zpool based on iscsi
> devices, I should not mix local disks with iscsi target disks.
> 
> But I think I can have 2 computers, each exporting a set of local
> disks as iscsi targets. And to have iscsi initiators on the same
> computers importing these targets to build zpools.
> 
> Also, looking at sbdadm, I think I can 'create lu
> /dev/rdsk/c0t0d3s2'.
> 
> Ok, I think I would better try it and report how it goes.

Actually, things can become quite complex, I'd like to reduce the "mental" 
involvement to the absolute minimum, mainly because we often faced a situation 
where something would suddenly break, which had been running for a long time 
without problems. This is when peeple start… well maybe not panicking, but 
having to recap what the current setup was like and what they had to do to 
tackle this.

So, uniformity is a great deal of help on such systems - at least for us. 
Technically, there is no issue with mixing local and remote iSCST targets on 
the same node, which serves as an iSCSI target and a NFS head.

Also, if one of the nodes really goes down, you will be loosing your failover 
NFS head as well, maybe not a big deal and depending on your requirements okay. 
I do have such a setup as well, although only for an archive ZPOOL, where I can 
tolerate this reduced redundancy for the benefit of a more lightweight setup.

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS based redundant NFS

2017-09-27 Thread Stephan Budach
Hi Sergey,

- Ursprüngliche Mail -
> Von: "sergey ivanov" 
> An: "omnios-discuss" 
> Gesendet: Mittwoch, 27. September 2017 21:31:05
> Betreff: [OmniOS-discuss] OmniOS based redundant NFS
> 
> Hi,
> as end-of-life of r151014 approaches, we are planning upgrade for our
> NFS servers.
> I'm thinking about 2 servers providing ISCSI targets, and 2 another
> OmniOS servers using these ISCSI block devices in mirrored ZPOOL
> setup. IP address for NFS service can be a floating IP between those
> 2
> servers.

This is quite the same setup, as we have it. I am running two omniOS hosts as 
iSCSI targets and RSF-1 on two other omniOS hosts as NFS heads, where the NFS 
VIPs are failling over between. This setup has been very stable for quite some 
time and failovers have been occurring on a couple of occasions.

> I have the following questions:
> 1. Are there any advantages to have separate ISCSI target servers and
> NFS servers or I should better combine one ISCSI target and NFS
> server on each of 2 hosts?

The reason to use two x two seperate servers is, that the mirrored zpool's 
vdevs look the same on each NFS head.
This makes for a very straight forward setup, but on the other hand brings some 
interesting decisions to the table, when it comes to the design of the iSCSI 
targets.

We all know, that ZFS works best when presented with raw devices and the 
closest to that would be iSCSI to raw devices/partitions. However, this will 
leave you to decide how you want to arrange your targets. If you chose to have 
fewer targets with more LUNs, you will face some interesting challenges when it 
comes to device failures, which will have you to offline that whole target on 
your NFS heads, leaving you running with a degraded zpool on your NFS head. I 
just went through that and I can tell you that you will need to prepare for 
such a case, and the more drives you are using the more challenging it becomes.


> 2. I do not want snapshots, checksums, and other ZFS features for
> block devices at the level where they are exported as ISCSI targets,
> -
> I would prefer these features at the level where these block devices
> are combined into mirror Zpools. Maybe it's better to have these
> ISCSI
> target servers running some not so advanced OS and have 2 Linux
> boxes?

Me neither, but I chose omniOS as my iSCSI targets nevertheless. I do make 
heavy use of ZFS' features like snapshots and clones on my NFS heads and I am 
very comfortable with that. No bells and whistles on my target nodes.

> 3. But if I have SSD for intent log and for cache, - maybe they can
> improve performance for ZVOLs used as block devices for ISCSI
> targets?
> 

It will depend on your workload. I do also have some S3700s for ZIL on some of 
my iSCSI-based ZPOOls.

> Does anybody have experience setting up such redundant NFS servers?

Well… yes, and afaik, there are also some other people around this list, who 
are using similar setups, short of the iSCS target nodes, providing non-ZFS 
based LUNs, that seems to be more exotic… ;)

> --
> Regards,
> Sergey Ivanov

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] [SOLVED] Re: Can't destroy ZFS

2017-09-22 Thread Stephan Budach
Shoot… ;) Right after hitting the send button, I issued this: 

root@omnios:~# zdb -d vmpool 
Dataset mos [META], ID 0, cr_txg 4, 20.1M, 309 objects 
Dataset vmpool/nfsZimbraData [ZPL], ID 82, cr_txg 29169, 41.0K, 16 objects 
Dataset vmpool/esxi [ZPL], ID 49, cr_txg 20, 40.7G, 119 objects 
Dataset vmpool/iSCSI-Targets/EyeTV/iSCSI-Targets [ZPL], ID 268, cr_txg 296266, 
23.0K, 7 objects 
Dataset vmpool/iSCSI-Targets/EyeTV [ZVOL], ID 262, cr_txg 296222, 238G, 2 
objects 
Dataset vmpool/iSCSI-Targets [ZPL], ID 256, cr_txg 296164, 23.0K, 9 objects 
Dataset vmpool/nfsCloudData [ZPL], ID 94, cr_txg 30110, 70.3G, 280046 objects 
Dataset vmpool [ZPL], ID 21, cr_txg 1, 23.0K, 10 objects 
Verified large_blocks feature refcount of 0 is correct 
Verified sha512 feature refcount of 0 is correct 
Verified skein feature refcount of 0 is correct 
Verified edonr feature refcount of 0 is correct 

After destroying this "invisible" Dataset 
vmpool/iSCSI-Targets/EyeTV/iSCSI-Targets dataset, I was able to delete it's 
parent datasets. 

Sorry for the noise. 

Cheers, 
stephan 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Can't destroy ZFS

2017-09-22 Thread Stephan Budach
Hi, 


after having received a zvol from my old S11 box vis zfs send/recv and being 
unable to re-import the LUN via stmfadm, I decided to remove that zvol/ZFS and 
start over. However, I cannot remove that particular ZFS and trying to do so 
yields this strange error: 


root@omnios:~# zfs destroy vmpool/iSCSI-Targets/EyeTV 
cannot destroy 'vmpool/iSCSI-Targets/EyeTV': dataset already exists 


The zpool currently looks like this… (by the way, no snaps whatsoever on it): 


root@omnios:~# zfs list -r vmpool 
NAME USED AVAIL REFER MOUNTPOINT 
vmpool 348G 3.17T 23K /vmpool 
vmpool/esxi 39.4G 1.96T 39.4G /vmpool/esxi 
vmpool/iSCSI-Targets 238G 3.17T 23K /tank/iSCSI-Targets 
vmpool/iSCSI-Targets/EyeTV 238G 3.17T 238G - 
vmpool/nfsCloudData 70.3G 186G 70.3G /vmpool/nfsCloudData 
vmpool/nfsZimbraData 41K 100G 41K /vmpool/nfsZimbraData 




I have scrubbed the web a bit for this particular error, but all the reports 
seem to relate to either snapshots or clones. 


Any idea is greatly appeciated. 


Cheers, 
stephan

smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Ang: COMSTAR and blocksizes

2017-09-11 Thread Stephan Budach
> > - Ursprüngliche Mail -
> > > Von: "Johan Kragsterman" <johan.kragster...@capvert.se>
> > > An: "Stephan Budach" <stephan.bud...@jvm.de>
> > > CC: omnios-discuss@lists.omniti.com
> > > Gesendet: Donnerstag, 7. September 2017 12:08:45
> > > Betreff: Ang: [OmniOS-discuss] COMSTAR and blocksizes
> > > 
> > > 
> > > Hi!
> > > 
> > > -"OmniOS-discuss" <omnios-discuss-boun...@lists.omniti.com>
> > > skrev: -
> > > Till: omnios-discuss@lists.omniti.com
> > > Från: Stephan Budach
> > > Sänt av: "OmniOS-discuss"
> > > Datum: 2017-09-07 10:38
> > > Ärende: [OmniOS-discuss] COMSTAR and blocksizes
> > > 
> > > Hi,
> > > 
> > > I am having trouble getting an issue sorted out, where omniOS
> > > 151020
> > > complaints about mismatched blocksizes on some COMSTAR iSCSI
> > > LUNS,
> > > like this:
> > > 
> > > Sep  7 08:52:07 zfsha02gh79 104 I/O requests are not aligned
> > > with
> > > 8192 disk sector size in 10 seconds. They are handled through
> > > Read
> > > Modify Write but the performance is very low!
> > > Sep  7 08:52:07 zfsha02gh79 scsi: [ID 107833 kern.warning]
> > > WARNING:
> > > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> > > Sep  7 08:52:07 zfsha02gh79 79 I/O requests are not aligned
> > > with
> > > 8192 disk sector size in 10 seconds. They are handled through
> > > Read
> > > Modify Write but the performance is very low!
> > > Sep  7 08:52:16 zfsha02gh79 scsi: [ID 107833 kern.warning]
> > > WARNING:
> > > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3132 (sd88):
> > > Sep  7 08:52:16 zfsha02gh79 20 I/O requests are not aligned
> > > with
> > > 8192 disk sector size in 10 seconds. They are handled through
> > > Read
> > > Modify Write but the performance is very low!
> > > Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning]
> > > WARNING:
> > > /scsi_vhci/disk@g600144f0564d504f4f4c3038534c3033 (sd110):
> > > Sep  7 08:52:17 zfsha02gh79 1 I/O requests are not aligned
> > > with
> > > 8192 disk sector size in 10 seconds. They are handled through
> > > Read
> > > Modify Write but the performance is very low!
> > > Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning]
> > > WARNING:
> > > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> > > Sep  7 08:52:17 zfsha02gh79 24 I/O requests are not aligned
> > > with
> > > 8192 disk sector size in 10 seconds. They are handled through
> > > Read
> > > Modify Write but the performance is very low!
> > > 
> > > These COMSTAR LUNs are configured to export a blocksize 8k like
> > > this:
> > > 
> > > LU Name: 600144F0564D504F4F4C3037534C3132
> > > Operational Status: Online
> > > Provider Name : sbd
> > > Alias : nfsvmpool07Slot12
> > > View Entry Count  : 1
> > > Data File : /dev/rdsk/c3t50015178F364A264d0p1
> > > Meta File : not set
> > > Size  : 200042414080
> > > Block Size: 8192
> > > Management URL: not set
> > > Vendor ID : SUN
> > > Product ID: COMSTAR
> > > Serial Num: not set
> > > Write Protect : Disabled
> > > Writeback Cache   : Enabled
> > > Access State  : Active
> > > 
> > > Now, the system seems to recognize the 8k, but for whatever
> > > reason,
> > > doesn't adjust the block size accordingly. It does that for the
> > > 4k
> > > LUNs, however, so I am unsure, on how to tackle this? Any tipp,
> > > anyone could share?
> > > 
> > > Thanks,
> > > Stephan
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > What's in the bottom here?
> > > 
> > > Data File : /dev/rdsk/c3t50015178F364A264d0p1
> > > 
> > > It dosesn't look like a zvol?
> > > 
> > > /Johan
> > 
> > It isn't one - it's a raw partition, an Intel SSD in this case.
> > 
> > Cheers,
> > Stephan
> > 
> 
> Hi all,
> 
> I just exported this zpool from my omniOS 020 and imported it on a
> recent oi box and the issue with the 8k writes d

Re: [OmniOS-discuss] Ang: COMSTAR and blocksizes

2017-09-08 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Stephan Budach" <stephan.bud...@jvm.de>
> An: "Johan Kragsterman" <johan.kragster...@capvert.se>
> CC: omnios-discuss@lists.omniti.com
> Gesendet: Donnerstag, 7. September 2017 12:22:06
> Betreff: Re: [OmniOS-discuss] Ang:  COMSTAR and blocksizes
> 
> 
> 
> - Ursprüngliche Mail -
> > Von: "Johan Kragsterman" <johan.kragster...@capvert.se>
> > An: "Stephan Budach" <stephan.bud...@jvm.de>
> > CC: omnios-discuss@lists.omniti.com
> > Gesendet: Donnerstag, 7. September 2017 12:08:45
> > Betreff: Ang: [OmniOS-discuss] COMSTAR and blocksizes
> > 
> > 
> > Hi!
> > 
> > -"OmniOS-discuss" <omnios-discuss-boun...@lists.omniti.com>
> > skrev: -
> > Till: omnios-discuss@lists.omniti.com
> > Från: Stephan Budach
> > Sänt av: "OmniOS-discuss"
> > Datum: 2017-09-07 10:38
> > Ärende: [OmniOS-discuss] COMSTAR and blocksizes
> > 
> > Hi,
> > 
> > I am having trouble getting an issue sorted out, where omniOS
> > 151020
> > complaints about mismatched blocksizes on some COMSTAR iSCSI LUNS,
> > like this:
> > 
> > Sep  7 08:52:07 zfsha02gh79 104 I/O requests are not aligned
> > with
> > 8192 disk sector size in 10 seconds. They are handled through Read
> > Modify Write but the performance is very low!
> > Sep  7 08:52:07 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> > Sep  7 08:52:07 zfsha02gh79 79 I/O requests are not aligned
> > with
> > 8192 disk sector size in 10 seconds. They are handled through Read
> > Modify Write but the performance is very low!
> > Sep  7 08:52:16 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3132 (sd88):
> > Sep  7 08:52:16 zfsha02gh79 20 I/O requests are not aligned
> > with
> > 8192 disk sector size in 10 seconds. They are handled through Read
> > Modify Write but the performance is very low!
> > Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> > /scsi_vhci/disk@g600144f0564d504f4f4c3038534c3033 (sd110):
> > Sep  7 08:52:17 zfsha02gh79 1 I/O requests are not aligned with
> > 8192 disk sector size in 10 seconds. They are handled through Read
> > Modify Write but the performance is very low!
> > Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> > /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> > Sep  7 08:52:17 zfsha02gh79 24 I/O requests are not aligned
> > with
> > 8192 disk sector size in 10 seconds. They are handled through Read
> > Modify Write but the performance is very low!
> > 
> > These COMSTAR LUNs are configured to export a blocksize 8k like
> > this:
> > 
> > LU Name: 600144F0564D504F4F4C3037534C3132
> > Operational Status: Online
> > Provider Name : sbd
> > Alias : nfsvmpool07Slot12
> > View Entry Count  : 1
> > Data File : /dev/rdsk/c3t50015178F364A264d0p1
> > Meta File : not set
> > Size  : 200042414080
> > Block Size: 8192
> > Management URL: not set
> > Vendor ID : SUN
> > Product ID: COMSTAR
> > Serial Num: not set
> > Write Protect : Disabled
> > Writeback Cache   : Enabled
> > Access State  : Active
> > 
> > Now, the system seems to recognize the 8k, but for whatever reason,
> > doesn't adjust the block size accordingly. It does that for the 4k
> > LUNs, however, so I am unsure, on how to tackle this? Any tipp,
> > anyone could share?
> > 
> > Thanks,
> > Stephan
> > 
> > 
> > 
> > 
> > 
> > 
> > What's in the bottom here?
> > 
> > Data File : /dev/rdsk/c3t50015178F364A264d0p1
> > 
> > It dosesn't look like a zvol?
> > 
> > /Johan
> 
> It isn't one - it's a raw partition, an Intel SSD in this case.
> 
> Cheers,
> Stephan
> 

Hi all,

I just exported this zpool from my omniOS 020 and imported it on a recent oi 
box and the issue with the 8k writes doesn't seem to be present on oi. Any 
thoughts on this?

Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Ang: COMSTAR and blocksizes

2017-09-07 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Johan Kragsterman" <johan.kragster...@capvert.se>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: omnios-discuss@lists.omniti.com
> Gesendet: Donnerstag, 7. September 2017 12:08:45
> Betreff: Ang: [OmniOS-discuss] COMSTAR and blocksizes
> 
> 
> Hi!
> 
> -"OmniOS-discuss" <omnios-discuss-boun...@lists.omniti.com>
> skrev: -
> Till: omnios-discuss@lists.omniti.com
> Från: Stephan Budach
> Sänt av: "OmniOS-discuss"
> Datum: 2017-09-07 10:38
> Ärende: [OmniOS-discuss] COMSTAR and blocksizes
> 
> Hi,
> 
> I am having trouble getting an issue sorted out, where omniOS 151020
> complaints about mismatched blocksizes on some COMSTAR iSCSI LUNS,
> like this:
> 
> Sep  7 08:52:07 zfsha02gh79 104 I/O requests are not aligned with
> 8192 disk sector size in 10 seconds. They are handled through Read
> Modify Write but the performance is very low!
> Sep  7 08:52:07 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> Sep  7 08:52:07 zfsha02gh79 79 I/O requests are not aligned with
> 8192 disk sector size in 10 seconds. They are handled through Read
> Modify Write but the performance is very low!
> Sep  7 08:52:16 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3132 (sd88):
> Sep  7 08:52:16 zfsha02gh79 20 I/O requests are not aligned with
> 8192 disk sector size in 10 seconds. They are handled through Read
> Modify Write but the performance is very low!
> Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> /scsi_vhci/disk@g600144f0564d504f4f4c3038534c3033 (sd110):
> Sep  7 08:52:17 zfsha02gh79 1 I/O requests are not aligned with
> 8192 disk sector size in 10 seconds. They are handled through Read
> Modify Write but the performance is very low!
> Sep  7 08:52:17 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING:
> /scsi_vhci/disk@g600144f0564d504f4f4c3037534c3033 (sd94):
> Sep  7 08:52:17 zfsha02gh79 24 I/O requests are not aligned with
> 8192 disk sector size in 10 seconds. They are handled through Read
> Modify Write but the performance is very low!
> 
> These COMSTAR LUNs are configured to export a blocksize 8k like this:
> 
> LU Name: 600144F0564D504F4F4C3037534C3132
> Operational Status: Online
> Provider Name : sbd
> Alias : nfsvmpool07Slot12
> View Entry Count  : 1
> Data File : /dev/rdsk/c3t50015178F364A264d0p1
> Meta File : not set
> Size  : 200042414080
> Block Size: 8192
> Management URL: not set
> Vendor ID : SUN
> Product ID: COMSTAR
> Serial Num: not set
> Write Protect : Disabled
> Writeback Cache   : Enabled
> Access State  : Active
> 
> Now, the system seems to recognize the 8k, but for whatever reason,
> doesn't adjust the block size accordingly. It does that for the 4k
> LUNs, however, so I am unsure, on how to tackle this? Any tipp,
> anyone could share?
> 
> Thanks,
> Stephan
> 
> 
> 
> 
> 
> 
> What's in the bottom here?
> 
> Data File : /dev/rdsk/c3t50015178F364A264d0p1
> 
> It dosesn't look like a zvol?
> 
> /Johan

It isn't one - it's a raw partition, an Intel SSD in this case.

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Loosing NFS shares

2017-06-22 Thread Stephan Budach
Hi Oliver, 


Von: "Oliver Weinmann"  
An: "Tobias Oetiker"  
CC: "omnios-discuss"  
Gesendet: Donnerstag, 22. Juni 2017 09:13:27 
Betreff: Re: [OmniOS-discuss] Loosing NFS shares 



Hi, 



Don’t think so: 



svcs -vx rcapd 



shows nothing. 







Oliver Weinmann 
Senior Unix VMWare, Storage Engineer 

Telespazio VEGA Deutschland GmbH 
Europaplatz 5 - 64293 Darmstadt - Germany 
Ph: + 49 (0)6151 8257 744 | Fax: +49 (0)6151 8257 799 
[ mailto:oliver.weinm...@telespazio-vega.de | 
oliver.weinm...@telespazio-vega.de ] 
[ http://www.telespazio-vega.de/ | http://www.telespazio-vega.de ] 


Registered office/Sitz: Darmstadt, Register court/Registergericht: Darmstadt, 
HRB 89231; Managing Director/Geschäftsführer: Sigmar Keller 


From: Tobias Oetiker [mailto:t...@oetiker.ch] 
Sent: Donnerstag, 22. Juni 2017 09:11 
To: Oliver Weinmann  
Cc: omnios-discuss  
Subject: Re: [OmniOS-discuss] Loosing NFS shares 





Oliver, 





are you running rcapd ? we found that (at least of the box) this thing wrecks 
havoc to both 


nfs and iscsi sharing ... 





cheers 


tobi 





- On Jun 22, 2017, at 8:45 AM, Oliver Weinmann < [ 
mailto:oliver.weinm...@telespazio-vega.de | oliver.weinm...@telespazio-vega.de 
] > wrote: 





Hi, 



we are using OmniOS for a few months now and have big trouble with stability. 
We mainly use it for VMware NFS datastores. The last 3 nights we lost all NFS 
datastores and VMs stopped running. I noticed that even though zfs get sharenfs 
shows folders as shared they become inaccessible. Setting sharenfs to off and 
sharing again solves the issue. I have no clue where to start. I’m fairly new 
to OmniOS. 



Any help would be highly appreciated. 



Thanks and Best Regards, 

Oliver 





Oliver Weinmann 
Senior Unix VMWare, Storage Engineer 

Telespazio VEGA Deutschland GmbH 
Europaplatz 5 - 64293 Darmstadt - Germany 
Ph: + 49 (0)6151 8257 744 | Fax: +49 (0)6151 8257 799 
[ mailto:oliver.weinm...@telespazio-vega.de | 
oliver.weinm...@telespazio-vega.de ] 
[ http://www.telespazio-vega.de/ | http://www.telespazio-vega.de ] 


Registered office/Sitz: Darmstadt, Register court/Registergericht: Darmstadt, 
HRB 89231; Managing Director/Geschäftsführer: Sigmar Keller 



What is the output from fmdump / fmdump -v? Also, it would be good to have a 
better understanding of your setup. We have been using NFS shares from OmniOS 
since r006 on OVM and also VMWare and at least the NFS part has always been 
very solid for us. So, how did you setup your storage and how many NFS clients 
do you have? 

Cheers, 
Stephan 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Loosing NFS shares

2017-06-22 Thread Stephan Budach
Hi Oliver, 

- Ursprüngliche Mail -

> Von: "Oliver Weinmann" 
> An: omnios-discuss@lists.omniti.com
> Gesendet: Donnerstag, 22. Juni 2017 08:45:14
> Betreff: [OmniOS-discuss] Loosing NFS shares

> Hi,

> we are using OmniOS for a few months now and have big trouble with
> stability. We mainly use it for VMware NFS datastores. The last 3
> nights we lost all NFS datastores and VMs stopped running. I noticed
> that even though zfs get sharenfs shows folders as shared they
> become inaccessible. Setting sharenfs to off and sharing again
> solves the issue. I have no clue where to start. I’m fairly new to
> OmniOS.

> Any help would be highly appreciated.

> Thanks and Best Regards,
> Oliver


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] NFS & SMB connection freezes 30-40 second sometimes.

2017-05-25 Thread Stephan Budach
- Ursprüngliche Mail - 
Hi Özkan,

> Von: "Özkan Göksu" 
> An: "Dan McDonald" 
> CC: "omnios-discuss" 
> Gesendet: Donnerstag, 25. Mai 2017 16:09:18
> Betreff: Re: [OmniOS-discuss] NFS & SMB connection freezes 30-40
> second sometimes.

> I will update ofc but i can't shut down right now a storage unit..
> Only 2 times in 5 months the problem occurred so i can not track the
> problem. I should be ready to find out what it is next time.
> I need to solve the issue completely and to solve the problem I need
> to understand WHY its happening. Right now I have no idea...

> BTW: while the problem occurs I tried to ping my server and Guess
> what? My network alives.

I don't seem to understand. Did your host not respond to the pings you sent to 
it?
I do run a couple of r020 boxes which are part of a RSF-1 cluster and which are 
serving NFS to a bunch of clients: OracleVM and VMWare. I also have all of my 
storage/client conencts as "active" LACP connects and I don't have any issues 
with those.

If you are experiencing network issues, than the rest of your issue might well 
be caused by that, of course.

> BTW 2: I'm using active-active LACP. Could this be the reason?

> > Özkan GÖKSU | Tekn. Geliştirme | ozkan.go...@usishi.com
> 
> > C : +90 555 449 88 71 | T : +90 (216) 442 7070 |
> 
> > http://www.usishi.com
> 

> 2017-05-25 16:46 GMT+03:00 Dan McDonald < dan...@kebe.com > :

> > r151022 is out now. Please upgrade to that and see if the problem
> > manifests.
> 

> > Dan
> 

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM and an "xdf" Device

2017-03-24 Thread Stephan Budach
Hi,

I have started out by simply copying the Xen disk image over to a omniOS host, 
where I attached the disk image using lofiadm. I then immediately tried zpool 
import -d /dev/lofi, but that didn't output anything. I then ran prtvtoc on the 
lofi device:

root@tr1207410:/tr1207410data01# prtvtoc /dev/lofi/1
* /dev/lofi/1 (volume "lofi") partition map
*
* Dimensions:
* 512 bytes/sector
*1449 sectors/track
*   1 tracks/cylinder
*1449 sectors/cylinder
*   46312 cylinders
*   46312 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  001  0  67106088  67106087
This is what fdisk says about this volume:

root@tr1207410:/tr1207410data01# fdisk -R /dev/rlofi/1
 Total disk size is 46312 cylinders
 Cylinder size is 1449 (512 byte) blocks

   Cylinders
  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  4631346314100


So… there is something there, but how to mount that sucker?

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM and an "xdf" Device

2017-03-23 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Dan McDonald" <dan...@omniti.com>
> An: "Stephan Budach" <stephan.bud...@jvm.de>, "Dan McDonald" 
> <dan...@omniti.com>
> CC: "Prakash Surya" <prakash.su...@delphix.com>, "omnios-discuss" 
> <omnios-discuss@lists.omniti.com>
> Gesendet: Donnerstag, 23. März 2017 18:45:01
> Betreff: Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM 
> and an "xdf" Device
> 
> 
> > On Mar 23, 2017, at 1:43 PM, Stephan Budach <stephan.bud...@jvm.de>
> > wrote:
> > 
> > Ha ha… yeah… I will try. Howerver, without any network
> > connectivity, I will be having quite a hard time getting those
> > dumps from the guest anywhere.
> 
> Can you maybe use the virtual-disks and attach them to a system with
> working network connectivity?
> 
> Just a thought,
> Dan
> 
> 
Yes, this is what I also thought, but the guest would need to be able to read 
the rpool. Will Solaris 11 be able to mount the rpool or is the underlying 
zpool version too far off? Solaris 11 runs as guest on Oracle VM…

Other than that, I could try to transfer the vdisk itself over to a omniOS box 
and just try to mount there. There was something about some offset, when doing 
this… with vdisk images, which I vaguely remember… ;)

Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM and an "xdf" Device

2017-03-23 Thread Stephan Budach
Ha ha… yeah… I will try. Howerver, without any network connectivity, I will be 
having quite a hard time getting those dumps from the guest anywhere. 


Cheers, 
Stephan 

- Ursprüngliche Mail -


Von: "Dan McDonald" <dan...@omniti.com> 
An: "Stephan Budach" <stephan.bud...@jvm.de>, "Dan McDonald" 
<dan...@omniti.com> 
CC: "Prakash Surya" <prakash.su...@delphix.com>, "omnios-discuss" 
<omnios-discuss@lists.omniti.com> 
Gesendet: Donnerstag, 23. März 2017 18:08:11 
Betreff: Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM 
and an "xdf" Device 


Share kernel core dumps please so we can debug them please. 


Dan 

Sent from my iPhone (typos, autocorrect, and all) 

On Mar 23, 2017, at 1:05 PM, Stephan Budach < stephan.bud...@jvm.de > wrote: 






Hi, 


following-up on this thread, I managed to get the current r151021 ISO booting 
and installing on my Oracle VM Xen host - yay!! 


I have installed and tweaked /etc/system such as that the system boots up 
sucessfully. However, I am unable to configure the network card, since as soon 
as I am trying ipadm create-addr on it, omniOS crashes, dumps a core and 
reboots. 


Another interesting issue may be the fact that format shows two disks c1d0 
 and c2t0d0 , when there is only one disk in 
the system anyway and rpool has been installed on c3t0d0. 


Also, there're some occasional log messages to the console about a PCI device 
from which no SOF interrupts have been received and which is an unusable USB 
UHCI host controller at this point. 


Cheers, 
Stephan 









smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM and an "xdf" Device

2017-03-23 Thread Stephan Budach
Hi, 

following-up on this thread, I managed to get the current r151021 ISO booting 
and installing on my Oracle VM Xen host - yay!! 

I have installed and tweaked /etc/system such as that the system boots up 
sucessfully. However, I am unable to configure the network card, since as soon 
as I am trying ipadm create-addr on it, omniOS crashes, dumps a core and 
reboots. 

Another interesting issue may be the fact that format shows two disks c1d0 
 and c2t0d0 , when there is only one disk in 
the system anyway and rpool has been installed on c3t0d0. 

Also, there're some occasional log messages to the console about a PCI device 
from which no SOF interrupts have been received and which is an unusable USB 
UHCI host controller at this point. 

Cheers, 
Stephan 

smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] dladm shows -NaN für LACP bond

2017-03-20 Thread Stephan Budach
Hi, 


I was just checking the states of my LACP bonds on my omniOS boxes, due to some 
scheduled maintenace work on our Nexus switches, when I came across one box 
that does work normally, but where dladm shows no stats for one of the two LACP 
bonds, when running dladm show-aggr -s: 



root@zfsha01colt:/root# dladm show-aggr -s 
LINK PORT IPACKETS RBYTES OPACKETS OBYTES IPKTDIST OPKTDIST 
iscsi0 -- 61867781898 73344540353111 70243236328 129909713865234 -- -- 
-- ixgbe0 11144772596 3672486214983 24068168071 64207901139258 18,0 34,3 
-- ixgbe2 50723009302 69672054138128 46175068257 65701812725976 82,0 65,7 
nfs0 -- 0 0 0 0 -- -- 
-- ixgbe1 0 0 0 0 -NaN -NaN 
-- ixgbe3 0 0 0 0 -NaN -NaN 


However, when querying nfs0 directly, dladm just reports as expected: 



root@zfsha01colt:/root# dladm show-aggr -s nfs0 
LINK PORT IPACKETS RBYTES OPACKETS OBYTES IPKTDIST OPKTDIST 
nfs0 -- 10996605142 23230932089631 12027822487 62128732526817 -- -- 
-- ixgbe1 598367842 23690118857 2779810032 13926608816419 5,4 23,1 
-- ixgbe3 10398237300 23207241970774 9248012455 48202123710398 94,6 76,9 


This box is a r020 one: 



root@zfsha01colt:/root# uname -a 
SunOS zfsha01colt 5.11 omnios-bed3013 i86pc i386 i86pc 




Any idea as of why this could happen? The only thing I can think of is that one 
of the ports on one Nexus was flapping, which is also the cause for the 
scheduled maintenance on that switch, but it hasn't flapped for hours… 


Cheers, 
Stephan

smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is almost beta

2017-03-09 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Stephan Budach" <stephan.bud...@jvm.de>
> An: "Jens Bauernfeind" <bauernfe...@ipk-gatersleben.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Donnerstag, 9. März 2017 16:41:04
> Betreff: Re: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is 
> almost beta [signed OK]
> 
> Bummer…
> 
> - Ursprüngliche Mail -
> > Von: "Jens Bauernfeind" <bauernfe...@ipk-gatersleben.de>
> > An: "Stephan Budach" <stephan.bud...@jvm.de>
> > CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> > Gesendet: Donnerstag, 9. März 2017 11:56:43
> > Betreff: RE: [OmniOS-discuss] Bloody update on Repo, plus Kayak for
> > ISO is almost beta [signed OK]
> > 
> > Hi Stephan,
> > 
> > thats correct, it is an old version, but the current version on the
> > Oracle Database Appliance we are running here, engineered system,
> > yeah :-(
> > 
> > I think a snippet of the vm.cfg is enough?
> > 8<---
> > memory = 4096
> > kernel = '/usr/lib/xen/boot/hvmloader'
> > cpu_cap = 0
> > vif = ['type=netfront,bridge=net1']
> > device_model = '/usr/lib64/xen/bin/qemu-dm'
> > builder = 'hvm'
> > vnclisten = '0.0.0.0'
> > boot = 'c'
> > cpus =
> > '0,1,2,3,4,5,6,7,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47'
> > passwd = ''
> > vcpus = 4
> > apic = 1
> > sdl = 0
> > maxvcpus = 4
> > serial = 'pty'
> > disk =
> > [u'file:/OVS/Repositories/sh_repo/.ACFS/snaps/omnios-test/VirtualMachines/omnios-test/omnios.img,hda,w']
> > vnc = 1
> > acpi = 1
> > maxmem = 4096
> > 8<---
> > 
> > Jens
> 
> seems, like I can't get that to work on OVM 3.4.2. OVM 3.4.2 is Xen
> 4.3.something and it totally stalls after having tweaked the apix
> settings and continue booting. It won't probably work on anything
> newer than OVM 3.3.x.
> 
> Cheers,
> Stephan

Sorry, I closed without providing any useful(?) information on the boot 
process, so here it comes and basically goes like this:

Welcome to kmdb
kmdb: dmod krtld failed to load: Error 2
[0] apix_enable/X
apix_enable:
apix_enable: 1
[0] apix_enable/W0
apix_enable: 0x1 = 0x0
[0] apix_enable/X
apix_enable:
apix_enable: 0
[0] :c
SunOS Release 5.11 Version omnios-master-650595c 64-bit
Copyright (c) 1983,2010, Oracle and/or its affiliates. All rights reserved.
WARNING: /pci@0,pci1af4,1100@1,2 (uhci0); No SOF interrupts have been received, 
this USB UHCI host controller us unusable
NOTICE: Kernel debugger present: disabling console power management.

aaand… that's it. Up from there it just sits there doing nothing, well 
seemingly, at least.

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is almost beta

2017-03-09 Thread Stephan Budach
Bummer…

- Ursprüngliche Mail -
> Von: "Jens Bauernfeind" <bauernfe...@ipk-gatersleben.de>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>
> Gesendet: Donnerstag, 9. März 2017 11:56:43
> Betreff: RE: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is 
> almost beta [signed OK]
> 
> Hi Stephan,
> 
> thats correct, it is an old version, but the current version on the
> Oracle Database Appliance we are running here, engineered system,
> yeah :-(
> 
> I think a snippet of the vm.cfg is enough?
> 8<---
> memory = 4096
> kernel = '/usr/lib/xen/boot/hvmloader'
> cpu_cap = 0
> vif = ['type=netfront,bridge=net1']
> device_model = '/usr/lib64/xen/bin/qemu-dm'
> builder = 'hvm'
> vnclisten = '0.0.0.0'
> boot = 'c'
> cpus =
> '0,1,2,3,4,5,6,7,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47'
> passwd = ''
> vcpus = 4
> apic = 1
> sdl = 0
> maxvcpus = 4
> serial = 'pty'
> disk =
> [u'file:/OVS/Repositories/sh_repo/.ACFS/snaps/omnios-test/VirtualMachines/omnios-test/omnios.img,hda,w']
> vnc = 1
> acpi = 1
> maxmem = 4096
> 8<---
> 
> Jens

seems, like I can't get that to work on OVM 3.4.2. OVM 3.4.2 is Xen 
4.3.something and it totally stalls after having tweaked the apix settings and 
continue booting. It won't probably work on anything newer than OVM 3.3.x.

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is almost beta

2017-03-09 Thread Stephan Budach
Hi Jens,

- Ursprüngliche Mail -
> Von: "Jens Bauernfeind" 
> An: "omnios-discuss" 
> Gesendet: Donnerstag, 9. März 2017 11:23:32
> Betreff: Re: [OmniOS-discuss] Bloody update on Repo, plus Kayak for ISO is 
> almost beta
> 
> Hello again,
> 
> I installed successfully the ISO on our oracle vm server (Oracle VM
> 3.2.9)
> -> xen-4.1.3-25.el5.223.26
> I just need to disable the apix stuff and add " set apix_enable=0"
> after the
> installation.
> The installer found 2 disks:
> 8<---
> bash-4.4# diskinfo
> TYPEDISKVID  PID  SIZE
>  RMV
> SSD
> ATA c2d0--  20.00 GiB
>   no
> no
> -   c4t768d0--  20.00 GiB
>   no
> no
> 8<---
> During the installation I used the c2d0 device.
> 
> A message about a failed pv driver pops up twice, but i don't know
> which
> device that is
> 8<---
> WARNING: pv driver failed to connect: /xpvd/xdf@5632
> WARNING: PV access to device disabled:
> /pci@0,0/pci-ide@1,1/ide@1/sd@0,0
> 8<---
> 
> Jens
> 

I tested the prior ISO yesterday on our OVM 3.4.2 and I couldn't get it to 
work. OVM 3.2.9 is somwhat "outdated", but could you provide the settings for 
the guest, that you chose?

Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen HVM and an "xdf" Device

2017-03-07 Thread Stephan Budach
Hi Prakash, 

- Ursprüngliche Mail -

> Von: "Prakash Surya" 
> An: omnios-discuss@lists.omniti.com
> Gesendet: Mittwoch, 8. März 2017 01:55:19
> Betreff: [OmniOS-discuss] Quick Test of r151021 ISO Install using Xen
> HVM and an "xdf" Device

> Hey All,

> I just wanted to post and say that I've tested the new r151021 ISO
> installer using a Xen VM and was able to successfully perform the
> installation. Here's a copy of the Xen VM configuration that I used:

> # cat ami-template.cfg
> builder='hvm'
> name='ami-template'
> vcpus=4
> memory=4096
> vif=['bridge=xenbr0, type=ioemu']
> #disk=[ 'file:/root/psurya/omni-kayak/r151021-kayak.iso,hdb:cdrom,r',
> # 'file:/root/psurya/omni-kayak/ami-template.img,xvda,w' ]
> disk=[ 'file:/root/psurya/omni-kayak/ami-template.img,xvda,w' ]
> #boot='d'
> boot='c'
> vnc=1
> vnclisten='0.0.0.0'
> vncconsole=1
> on_crash='preserve'
> xen_platform_pci=1
> serial='pty'
> on_reboot='destroy'

> The only "catch", is I have to set "apix_enable" to 0. I do that
> using KMDB during the installer (e.g. edit the loader so I drop to
> KMDB first), and then edit "/etc/system" after the install is
> completed but prior to the first boot (so I don't have to keep using
> KMDB for every boot).

> If I don't set that tuning, the system will "hang" during boot up
> (this is a problem with illumos and Xen HVM, and not specific to
> OmniOS or the new ISO).

> Additionally, I've attached a PNG image with some "zpool" and
> "format" output from within the Xen VM running after the ISO install
> (hopefully the mailing list doesn't strip the image from this post).

> Cheers,
> Prakash

Can you elaborate on that a little more as of what exactly you did to the 
loader? I am guessing that you modified the kernel boot arguments, but 
unfortuanetly, I am unfamiliar with the new loader. 

Cheers, 
Stephan 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] new supermicro server

2017-03-07 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Dan McDonald" 
> An: "Geoff Nordli" 
> CC: "omnios-discuss" 
> Gesendet: Mittwoch, 8. März 2017 07:44:30
> Betreff: Re: [OmniOS-discuss] new supermicro server
> 
> You didn't mention anything in the original note about NVMe, and
> neither did the spec sheet.
> 
> NVMe 1.0 and 1.1 should work on 020 and later OmniOS. Hang out on the
> illumos developers list to see the latest on that front.
> 
> Dan
> 
> Sent from my iPhone (typos, autocorrect, and all)
> 

In regards of NVMe, I am eyeing this particular Supermicro and I intend to get 
my hands on two of those in Q2:

SuperMicro SuperServer SSG-2028R-NR48N

I think these will bring a lot of fun to the table. ;)

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] new supermicro server

2017-03-07 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Dan McDonald" 
> An: "Geoff Nordli" 
> CC: "omnios-discuss" 
> Gesendet: Mittwoch, 8. März 2017 02:21:20
> Betreff: Re: [OmniOS-discuss] new supermicro server
> 
> 
> > On Mar 7, 2017, at 8:16 PM, Geoff Nordli  wrote:
> > 
> > Hi.
> > 
> > I am looking at ordering a new 3U supermicro server. as an all-in
> > one.   I have been using these recently:
> > 
> > https://www.supermicro.com/products/system/3U/6038/SSG-6038R-E1CR16L.cfm
> > 
> > It has the LSI 3008 HBA in IT mode.
> > 
> > Any other suggestions out there?
> 
> The one you mention is pretty good, especially if you want 2 + 16
> drives (2x2.5", 16x3.5") online.
> 
> Unless you want something smaller, I can think of much worse ways to
> spend your money.
> 
> Dan
> 

Yeah - I do have a couple of those, even with the current X10 board and I am 
really satisfied with those. Just throwed in another Intel 10GbE DP and hooked 
them up to our Nexus network. Really solid boxes!

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha

2017-03-02 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Dan McDonald" <dan...@omniti.com>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>, "Dan McDonald" 
> <dan...@omniti.com>
> Gesendet: Donnerstag, 2. März 2017 15:18:14
> Betreff: Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha
> 
> 
> > On Mar 2, 2017, at 2:29 AM, Stephan Budach <stephan.bud...@jvm.de>
> > wrote:
> > 
> >>> 
> >>> Next is some kind of stack trace and finally
> >>> 
> >>> BTX halted
> >> 
> >> Is that Xen-based?
> >> 
> >> Dan
> >> 
> >> 
> > 
> > Yes, it is.
> 
> Doug Hughes just showed me the same thing, and I'm pretty sure he's
> Xen too.
> 
> I wonder if I need to add BE items to the ISO.  I think instead of
> /kernel/i86pc/kernel/amd64/unix, you Xen folks may need to boot
> /platform/i86{hvm,xpv}/kernel/amd64/unix instead.  Worst case is I
> have to build a distinct xpv or hvm ISO.  :-P
> 
> Like I said, today's going to be a reading day, but I'll add that to
> my work on build_iso.sh, as that's where it belongs.
> 
> Thanks,
> Dan
> 
> 

Great - that would be a blast!

Happy reading,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha

2017-03-01 Thread Stephan Budach


- Ursprüngliche Mail -
> Von: "Dan McDonald" <dan...@omniti.com>
> An: "Stephan Budach" <stephan.bud...@jvm.de>
> CC: "omnios-discuss" <omnios-discuss@lists.omniti.com>, "Dan McDonald" 
> <dan...@omniti.com>
> Gesendet: Mittwoch, 1. März 2017 19:56:42
> Betreff: Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha
> 
> 
> > On Mar 1, 2017, at 1:02 PM, Stephan Budach <stephan.bud...@jvm.de>
> > wrote:
> > 
> > Hi Dan
> > 
> > Well, I just got it a shot on my Oracle VM cluster and after the
> > iso loaded, it threw this on the screen:
> > 
> > BTX loader 1.00 Starting in protected mode (base mem=9d400)
> > .
> > .
> > .
> > BIOS CD is cd0
> > 
> > Next is some kind of stack trace and finally
> > 
> > BTX halted
> 
> Is that Xen-based?
> 
> Dan
> 
> 

Yes, it is.

Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha

2017-03-01 Thread Stephan Budach
Hi Dan

- Ursprüngliche Mail -

> It MIGHT, and you're the sort of person I need to confirm/deny it.
> 
> Watch for a kebe.com link later today.
> 
> Dan
> 
> 

Well, I just got it a shot on my Oracle VM cluster and after the iso loaded, it 
threw this on the screen:

BTX loader 1.00 Starting in protected mode (base mem=9d400)
.
.
.
BIOS CD is cd0

Next is some kind of stack trace and finally

BTX halted

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] CALL FOR VOLUNTEERS - Kayak for ISO alpha

2017-02-28 Thread Stephan Budach
Hi Dan,

would that allow me to run OmniOS on a Xen-based Oracle VM? Frankly, I didn't 
manage to get that one up due to restrained resources on my end - and it since 
hasn't been pressing enough… ;)

Cheers,
Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] SMB and Netatalk

2017-02-16 Thread Stephan Budach


- Ursprüngliche Mail -
Von: "Adam Feigin" 
An: omnios-discuss@lists.omniti.com
Gesendet: Donnerstag, 16. Februar 2017 09:47:50
Betreff: [OmniOS-discuss]  SMB and Netatalk



On 15/02/17 18:16, omnios-discuss-requ...@lists.omniti.com wrote:
> From: F?bio Rabelo 
> To: omnios-discuss 
> Subject: [OmniOS-discuss] SMB and Netatalk
> Message-ID:
>   
> Content-Type: text/plain; charset=UTF-8
> 
> Hi to all
> 
> There are someone with experience in running SMB and/or Netatalk over OmniOS ?
> 
> Works OK ?
> 
> Some caveats to avoid ?
> 
> The possible scenario would be a server to hold Audio and Video files
> in a Video/Audio editing facility, with 10 GB network in/out, and 12 8
> TB hard disks in Raid Z2, 2 256GB SSD to ZIL, no ARC, 128 GB RAM .
> 

netatalk works like a charm, as does cifs. I'll just assume that you're
wanting to serve Macs; despite all that Apple is telling the world, my
experience is that AFP "works" better. I've experienced no end of
bizzare and differing problems using SMB/CIFS with OSX (not just on
OmniOS!). Each OSX version has various quirks with SMB, whereas AFP
being the "native" OSX file protocol just works plug an play, no fooling
around, across differing OSX versions.

You can either pull it in from the uulm.mawi omnios package repository,
or build it yourself (but it does have some dependencies, so you're
probably better off installing the from the repository).

/AWF

I guess anyone who looks into SMB for Macs, should also look at vfs_fruit for 
Samba. I haven't had the chance to set this up myself, but from what I have 
read, this is the way to go…

…on the other hand, I am still running our big servers using Netatalk 2.3.x and 
this hasn't failed me for years. ;)

Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Moving The Root Pool

2017-02-14 Thread Stephan Budach
Hi Andre, 

well, I wouldn't call it dead… there is a way to accomplish, what you want and 
I had already performed such an action before. However, I cannot recall all the 
steps necessary. Looking at the link Dan provided, the steps lined out in the 
document are still valid and I don't think that there is an easier way to do 
it. 

Regarding the other issues about a general advice against using a SSD as a 
rpool… there is none I know of. Depending on the manufacturer you may want to 
provide enough free space on the SSD, as I don't know if Ilumos/OmniOS have 
come up to speed regarding TRIM, but you can always counter that by providing 
enough space for the SSD's GC. 
If a SSD is suited as a rpool in Ilumos may be hard to judge. All I know is, 
that you should stay away from Samsung EVOs… unless you're operating them in a 
Windows environment. 

Cheers, 
Stephan 


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-30 Thread Stephan Budach

Am 31.01.17 um 00:15 schrieb Richard Elling:

On Jan 29, 2017, at 3:10 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

just to wrap this up… I decided to go with 15 additional LUNs on each storage 
zpool, to avoid zfs complainign about replication mismatches. I know, I cluld 
have done otherwise, but it somehow felt better this way.

After all three underlying zpools were "pimped", I was able to mount the 
problematic zpool in my S11.1 host without any issue. It just took a coulpe of seconds 
and zfs reported approx 2.53MB resilvered…

Now, there's a scrub running on that zpool tnat is just happily humming away on 
the data.

Thanks for all the input, everyone.

may all your scrubs complete cleanly :-)
  — richard


Stephan
I'm on it! ;) So far it has been running smoothly, only giving a couple 
of read errors for a ZFS that is encrypted and to which ZFS I hadn't the 
keys at hand, but otherwise, it's running fine. It will take another 9 
days, though to finish, running at 3x100MB/s…


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-29 Thread Stephan Budach

Hi,

just to wrap this up… I decided to go with 15 additional LUNs on each 
storage zpool, to avoid zfs complainign about replication mismatches. I 
know, I cluld have done otherwise, but it somehow felt better this way.


After all three underlying zpools were "pimped", I was able to mount the 
problematic zpool in my S11.1 host without any issue. It just took a 
coulpe of seconds and zfs reported approx 2.53MB resilvered…


Now, there's a scrub running on that zpool tnat is just happily humming 
away on the data.


Thanks for all the input, everyone.

Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-28 Thread Stephan Budach

Hi Richard,

Am 26.01.17 um 20:18 schrieb Richard Elling:


On Jan 26, 2017, at 12:20 AM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Hi Richard,

gotcha… read on, below…


"thin provisioning" bit you. For "thick provisioning" you’ll have a 
refreservation and/or reservation.

 — richard
yes, it was… Now, yesterday I was able to shove in three new Supermicro 
storage servers. Since I wanted to re-setup that whole zpool anyway, I 
thought it would be enough just to provide a big enough LUNs (22TB) to 
the currently exhausted zpool. This 22TB LUN is provided from the new 
systems via iSCSI.


When I tried to add that LUN as a new vdev to the pool, zpool naturally 
complained about the replication mismatch. Is it safe to do that anyway? 
I mean, the backing zpool of that LUN is also made up from 2 raidz-1 and 
I wanted to avoid actual triple-double redundancy… I will have to 
overhaul the whole setup anyway…


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fwd: Install on Supermicro DOM=low space left

2017-01-26 Thread Stephan Budach

Hi Fábio,

Am 26.01.17 um 12:22 schrieb Fábio Rabelo:

sorry, I forgot to change address to all list before send ...

-- Forwarded message --
From: Fábio Rabelo 
Date: 2017-01-26 9:21 GMT-02:00
Subject: Re: [OmniOS-discuss] Install on Supermicro DOM=low space left
To: "Volker A. Brandt" 


2017-01-26 9:06 GMT-02:00 Volker A. Brandt :

Hi Fábio!



I've just installed OmniOS on a Supermicro Motherboard with a DOM
device for boot .

It is working fine, no issues ...

But, the 64GB DOM has just 9GB of space left

Can I delete something ( temp files, compacted installed packages, etc
) to free some space ?

You might have oversized swap and/or dump volumes.  Do a

   zfs list -t volume

What volume sizes are shown

NAME USED  AVAIL  REFER  MOUNTPOINT
rpool/dump  41.5G  9.15G  41.5G  -
rpool/swap  4.13G  13.0G   276M  -

I did not changed anything during instalation proccess, I've just
accepted all defaults



If you still want to change the size of the dump volume:

zfs set volsize=16g rpool/dump

The size depends of course on the estimated size of a core dump, but 16G 
should ne way over the top.


Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-26 Thread Stephan Budach

Just for sanity… these are a couple of errors fmdump outputs using -eV

root@solaris11atest2:~# fmdump -eV
TIME   CLASS
Jan 25 2017 10:10:45.011761190 ereport.io.pciex.rc.tmp
nvlist version: 0
class = ereport.io.pciex.rc.tmp
ena = 0xff37bc9a861
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /intel-iommu@0,fbffe000
(end detector)

epkt_ver = 0x1
desc = 0x21152014
size = 0x0
addr = 0xca000
hdr1 = 0x60d7
hdr2 = 0x328000
reserved = 0x1
count = 0x1
total = 0x1
event_name = The Write field in a page-table entry is Clear 
when DMA write

VID = 0x8086
DID = 0x0
RID = 0x0
SID = 0x0
SVID = 0x0
reg_ver = 0x1
platform-specific = (embedded nvlist)
nvlist version: 0
VER_REG = 0x10
CAP_REG = 0x106f0462
ECAP_REG = 0xf020fe
GCMD_REG = 0x8680
GSTS_REG = 0xc780
FSTS_REG = 0x100
FECTL_REG = 0x0
FEDATA_REG = 0xf2
FEADDR_REG = 0xfee0
FEUADDR_REG = 0x0
FRCD_REG_LOW = 0xca000
FRCD_REG_HIGH = 0x800500d7
PMEN_REG = 0x64
PLMBASE_REG = 0x68
PLMLIMIT_REG = 0x6c
PHMBASE_REG = 0x70
PHMLIMIT_REG = 0x78
(end platform-specific)

__ttl = 0x1
__tod = 0x58886b95 0xb37626

Jan 25 2017 12:28:55.712580014 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.rqs.derr
ena = 0x88985751a4a02c01
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x579a001f
device-path = 
/iscsi/d...@iqn.2016-01.de.jvm.tr1206900:vsmpool12,0

(end detector)

devid = unknown
driver-assessment = info
op-code = 0x15
cdb = 0x15 0x10 0x0 0x0 0x18 0x0
pkt-reason = 0x0
pkt-state = 0x3f
pkt-stats = 0x0
stat-code = 0x2
key = 0x5
asc = 0x1a
ascq = 0x0
sense-data = 0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 
0x1a 0x0 0x0 0x0 0x0 0x0

__ttl = 0x1
__tod = 0x5bf7 0x2a791bae

Jan 25 2017 12:32:35.072413593 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.rqs.derr
ena = 0x8bc98528b5c00801
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x579a0024
device-path = 
/iscsi/d...@iqn.2016-01.de.jvm.tr1206901:vsmpool12,0

(end detector)

devid = unknown
driver-assessment = info
op-code = 0x15
cdb = 0x15 0x10 0x0 0x0 0x18 0x0
pkt-reason = 0x0
pkt-state = 0x3f
pkt-stats = 0x0
stat-code = 0x2
key = 0x5
asc = 0x1a
ascq = 0x0
sense-data = 0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 
0x1a 0x0 0x0 0x0 0x0 0x0

__ttl = 0x1
__tod = 0x5cd3 0x450f199

Jan 25 2017 12:32:52.661439798 ereport.io.scsi.cmd.disk.dev.rqs.derr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.rqs.derr
ena = 0x8c0b0b5c71e00401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x579a0029
device-path = 
/iscsi/d...@iqn.2016-01.de.jvm.tr1206902:vsmpool12,0

(end detector)

devid = unknown
driver-assessment = info
op-code = 0x15
cdb = 0x15 0x10 0x0 0x0 0x18 0x0
pkt-reason = 0x0
pkt-state = 0x3f
pkt-stats = 0x0
stat-code = 0x2
key = 0x5
asc = 0x1a
ascq = 0x0
sense-data = 0x70 0x0 0x5 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 
0x1a 0x0 0x0 0x0 0x0 0x0

__ttl = 0x1
__tod = 0x5ce4 0x276cc536

Jan 25 2017 12:35:48.187562523 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.uderr
ena = 0x8e98ee1dd5c00401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
cna_dev = 0x579a002e
device-path = 
/iscsi/d...@iqn.2016-01.de.jvm.tr1206902:vsmpool12,0

devid = id1,sd@n600144f07a35001a5693a2810001
(end detector)

devid = id1,sd@n600144f07a35001a5693a2810001
driver-assessment = retry
op-code = 0x8a

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-26 Thread Stephan Budach

Hi Richard,

gotcha… read on, below…

Am 26.01.17 um 00:43 schrieb Richard Elling:

more below…

On Jan 25, 2017, at 3:01 PM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Ooops… should have waited with sending that message after I rebootet 
the S11.1 host…



Am 25.01.17 um 23:41 schrieb Stephan Budach:

Hi Richard,

Am 25.01.17 um 20:27 schrieb Richard Elling:

Hi Stephan,

On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Hi guys,

I have been trying to import a zpool, based on a 3way-mirror 
provided by three omniOS boxes via iSCSI. This zpool had been 
working flawlessly until some random reboot of the S11.1 host. 
Since then, S11.1 has been importing this zpool without success.


This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… 
yeah I know, we shouldn't have done that in the first place, but 
performance was not the primary goal for that, as this one is a 
backup/archive pool.


When issueing a zpool import, it says this:

root@solaris11atest2:~# zpool import
  pool: vsmPool10
id: 12653649504720395171
 state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged 
devices.  The

fault tolerance of the pool may be compromised if imported.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

vsmPool10 DEGRADED
mirror-0 DEGRADED
c0t600144F07A350658569398F60001d0 DEGRADED  corrupted data
c0t600144F07A35066C5693A0D90001d0 DEGRADED  corrupted data
c0t600144F07A35001A5693A2810001d0 DEGRADED  corrupted data

device details:

c0t600144F07A350658569398F60001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35066C5693A0D90001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35001A5693A2810001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

However, when  actually running zpool import -f vsmPool10, the 
system starts to perform a lot of writes on the LUNs and iostat 
report an alarming increase in h/w errors:


root@solaris11atest2:~# iostat -xeM 5
extended device statistics  errors ---
devicer/sw/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot

sd0   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd1   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd2   0.00.0 0.00.0  0.0  0.00.0   0   0   0 71   
0  71

sd3   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd4   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd5   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
extended device statistics  errors ---
devicer/sw/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot

sd0  14.2  147.3 0.70.4  0.2  0.12.0   6   9 0   0   0   0
sd1  14.28.4 0.40.0  0.0  0.00.3   0   0 0   0   0   0
sd2   0.04.2 0.00.0  0.0  0.00.0   0   0   0 92   
0  92
sd3 157.3   46.2 2.10.2  0.0  0.73.7   0  14   0 30   
0  30
sd4 123.9   29.4 1.60.1  0.0  1.7   10.9   0  36   0 40   
0  40
sd5 142.5   43.0 2.00.1  0.0  1.9   10.2   0  45   0 88   
0  88

extended device statistics  errors ---
devicer/sw/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot

sd0   0.0  234.5 0.00.6  0.2  0.11.4   6  10 0   0   0   0
sd1   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd2   0.00.0 0.00.0  0.0  0.00.0   0   0   0 92   
0  92
sd3   3.6   64.0 0.00.5  0.0  4.3   63.2   0  63   0 235   
0 235
sd4   3.0   67.0 0.00.6  0.0  4.2   60.5   0  68   0 298   
0 298
sd5   4.2   59.6 0.00.4  0.0  5.2   81.0   0  72   0 406   
0 406

extended device statistics  errors ---
devicer/sw/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot

sd0   0.0  234.8 0.00.7  0.4  0.12.2  11  10 0   0   0   0
sd1   0.00.0 0.00.0  0.0  0.00.0   0   0 0   0   0   0
sd2   0.00.0 0.00.0  0.0  0.00.0   0   0   0 92   
0  92
sd3   5.4   54.4 0.00.3  0.0  2.9   48.5   0  67   0 384   
0 384
sd4   6.0   53.4 0.00.3  0.0  4.6   77.7   0  87   0 519   
0 519
sd5   6.0   60.8 0.00.3  0.0  4.8   72.5   0  87   0 727   
0 727


h/w errors are a classification of other errors. The full error 
list is available from "iostat -E" and will

be important to tracking this down.

A better, more detailed analysis can be gleaned from the "fmdump 
-e" ereports t

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-25 Thread Stephan Budach
Ooops… should have waited with sending that message after I rebootet the 
S11.1 host…



Am 25.01.17 um 23:41 schrieb Stephan Budach:

Hi Richard,

Am 25.01.17 um 20:27 schrieb Richard Elling:

Hi Stephan,

On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Hi guys,

I have been trying to import a zpool, based on a 3way-mirror 
provided by three omniOS boxes via iSCSI. This zpool had been 
working flawlessly until some random reboot of the S11.1 host. Since 
then, S11.1 has been importing this zpool without success.


This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… 
yeah I know, we shouldn't have done that in the first place, but 
performance was not the primary goal for that, as this one is a 
backup/archive pool.


When issueing a zpool import, it says this:

root@solaris11atest2:~# zpool import
  pool: vsmPool10
id: 12653649504720395171
 state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged 
devices.  The

fault tolerance of the pool may be compromised if imported.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

vsmPool10  DEGRADED
mirror-0 DEGRADED
c0t600144F07A350658569398F60001d0  DEGRADED corrupted data
c0t600144F07A35066C5693A0D90001d0  DEGRADED corrupted data
c0t600144F07A35001A5693A2810001d0  DEGRADED corrupted data

device details:

c0t600144F07A350658569398F60001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35066C5693A0D90001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35001A5693A2810001d0 DEGRADED 
scrub/resilver needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

However, when  actually running zpool import -f vsmPool10, the 
system starts to perform a lot of writes on the LUNs and iostat 
report an alarming increase in h/w errors:


root@solaris11atest2:~# iostat -xeM 5
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  71   
0  71
sd3   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd4   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd5   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0  14.2  147.30.70.4 0.2  0.12.0   6   9   0   0   
0   0
sd1  14.28.40.40.0 0.0  0.00.3   0   0   0   0   
0   0
sd2   0.04.20.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3 157.3   46.22.10.2 0.0  0.73.7   0  14   0  30   
0  30
sd4 123.9   29.41.60.1 0.0  1.7   10.9   0  36   0  40   
0  40
sd5 142.5   43.02.00.1 0.0  1.9   10.2   0  45   0  88   
0  88
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.0  234.50.00.6 0.2  0.11.4   6  10   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3   3.6   64.00.00.5 0.0  4.3   63.2   0  63   0 235   
0 235
sd4   3.0   67.00.00.6 0.0  4.2   60.5   0  68   0 298   
0 298
sd5   4.2   59.60.00.4 0.0  5.2   81.0   0  72   0 406   
0 406
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.0  234.80.00.7 0.4  0.12.2  11  10   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3   5.4   54.40.00.3 0.0  2.9   48.5   0  67   0 384   
0 384
sd4   6.0   53.40.00.3 0.0  4.6   77.7   0  87   0 519   
0 519
sd5   6.0   60.80.00.3 0.0  4.8   72.5   0  87   0 727   
0 727


h/w errors are a classification of other errors. The full error list 
is available from "iostat -E" and will

be important to tracking this down.

A better, more detailed analysis can be gleaned from the

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-25 Thread Stephan Budach

Hi Richard,

Am 25.01.17 um 20:27 schrieb Richard Elling:

Hi Stephan,

On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Hi guys,

I have been trying to import a zpool, based on a 3way-mirror provided 
by three omniOS boxes via iSCSI. This zpool had been working 
flawlessly until some random reboot of the S11.1 host. Since then, 
S11.1 has been importing this zpool without success.


This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… 
yeah I know, we shouldn't have done that in the first place, but 
performance was not the primary goal for that, as this one is a 
backup/archive pool.


When issueing a zpool import, it says this:

root@solaris11atest2:~# zpool import
  pool: vsmPool10
id: 12653649504720395171
 state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

vsmPool10  DEGRADED
mirror-0 DEGRADED
c0t600144F07A350658569398F60001d0  DEGRADED corrupted data
c0t600144F07A35066C5693A0D90001d0  DEGRADED corrupted data
c0t600144F07A35001A5693A2810001d0  DEGRADED corrupted data

device details:

c0t600144F07A350658569398F60001d0 DEGRADED scrub/resilver 
needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35066C5693A0D90001d0 DEGRADED scrub/resilver 
needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c0t600144F07A35001A5693A2810001d0 DEGRADED scrub/resilver 
needed

status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

However, when  actually running zpool import -f vsmPool10, the system 
starts to perform a lot of writes on the LUNs and iostat report an 
alarming increase in h/w errors:


root@solaris11atest2:~# iostat -xeM 5
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  71   
0  71
sd3   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd4   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd5   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0  14.2  147.30.70.4 0.2  0.12.0   6   9   0   0   
0   0
sd1  14.28.40.40.0 0.0  0.00.3   0   0   0   0   
0   0
sd2   0.04.20.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3 157.3   46.22.10.2 0.0  0.73.7   0  14   0  30   
0  30
sd4 123.9   29.41.60.1 0.0  1.7   10.9   0  36   0  40   
0  40
sd5 142.5   43.02.00.1 0.0  1.9   10.2   0  45   0  88   
0  88
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.0  234.50.00.6 0.2  0.11.4   6  10   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3   3.6   64.00.00.5 0.0  4.3   63.2   0  63   0 235   
0 235
sd4   3.0   67.00.00.6 0.0  4.2   60.5   0  68   0 298   
0 298
sd5   4.2   59.60.00.4 0.0  5.2   81.0   0  72   0 406   
0 406
 extended device statistics  
errors ---
devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
trn tot
sd0   0.0  234.80.00.7 0.4  0.12.2  11  10   0   0   
0   0
sd1   0.00.00.00.0 0.0  0.00.0   0   0   0   0   
0   0
sd2   0.00.00.00.0 0.0  0.00.0   0   0   0  92   
0  92
sd3   5.4   54.40.00.3 0.0  2.9   48.5   0  67   0 384   
0 384
sd4   6.0   53.40.00.3 0.0  4.6   77.7   0  87   0 519   
0 519
sd5   6.0   60.80.00.3 0.0  4.8   72.5   0  87   0 727   
0 727


h/w errors are a classification of other errors. The full error list 
is available from "iostat -E" and will

be important to tracking this down.

A better, more detailed analysis can be gleaned from the "fmdump -e" 
ereports that should be
associated with each h/w error. However, there are dozens of causes of 
these so

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-25 Thread Stephan Budach
Hi Dale,

this is exactly, what I am currently trying and the iostat errors are from that 
import run.

Stephan

Von meinem iPhone gesendet

> Am 25.01.2017 um 18:38 schrieb Dale Ghent <da...@omniti.com>:
> 
> 
> Oh, ok, I misunderstood you as trying to import illumos vdevs directly onto a 
> Oracle Solaris server.
> 
> This line:
> 
>>>> status: The pool was last accessed by another system.
> 
> indicates that the zpool was uncleanly exported (or not exported at all) on 
> the previous system it was imported on and the hostid of that system is still 
> imprinted in the vdev labels for the zpool (not the hostid of the system you 
> are trying to import on)
> 
> Have you tried 'zpool import -f vsmPool10' ?
> 
> /dale
> 
>> On Jan 25, 2017, at 12:14 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:
>> 
>> Hi Dale,
>> 
>> I know that and it's not that I am trying to import a S11.1 zpool on omniOS 
>> or vice versa. It's that the targets are omniOS and the initiator is S11.1. 
>> I am still trying to import the zpool on S11.1. My question was more 
>> directed at COMSTAR, which both still should have some fair overlapping, no?
>> 
>> I am in contact with Oracle and he mentioned some issues with zvols over 
>> iSCSI targets, which may be present for both systems, so I thought, that I'd 
>> give it a shot, that's all.
>> 
>> Cheers
>> Stephan
>> 
>>> Am 25.01.17 um 18:07 schrieb Dale Ghent:
>>> ZFS as implemented in Oracle Solaris is *not* OpenZFS, which is what 
>>> illumos (and all illumos distros), FreeBSD, and the ZFS on Linux/macOS 
>>> projects use. Up to a level of features, the two are compatible - but then 
>>> they diverge in features. If one pool has features the zfs driver does not 
>>> understand, you could run the risk of refusal to import as indicated here.
>>> 
>>> Seeing as how Oracle itself does not include OpenZFS features in its ZFS 
>>> implementation, and Oracle does not provide any information to OpenZFS 
>>> regarding features it invents, this will unfortunately be the state of 
>>> things unless Oracle changes its open source or information sharing 
>>> policies. Unfortunate but that's just the way things are.
>>> 
>>> /dale
>>> 
>>> 
>>>> On Jan 25, 2017, at 8:54 AM, Stephan Budach <stephan.bud...@jvm.de>
>>>> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> I have been trying to import a zpool, based on a 3way-mirror provided by 
>>>> three omniOS boxes via iSCSI. This zpool had been working flawlessly until 
>>>> some random reboot of the S11.1 host. Since then, S11.1 has been importing 
>>>> this zpool without success.
>>>> 
>>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I 
>>>> know, we shouldn't have done that in the first place, but performance was 
>>>> not the primary goal for that, as this one is a backup/archive pool.
>>>> 
>>>> When issueing a zpool import, it says this:
>>>> 
>>>> root@solaris11atest2:~# zpool import
>>>>  pool: vsmPool10
>>>>id: 12653649504720395171
>>>> state: DEGRADED
>>>> status: The pool was last accessed by another system.
>>>> action: The pool can be imported despite missing or damaged devices.  The
>>>>fault tolerance of the pool may be compromised if imported.
>>>>   see:
>>>> http://support.oracle.com/msg/ZFS-8000-EY
>>>> 
>>>> config:
>>>> 
>>>>vsmPool10  DEGRADED
>>>>  mirror-0 DEGRADED
>>>>c0t600144F07A350658569398F60001d0  DEGRADED  corrupted data
>>>>c0t600144F07A35066C5693A0D90001d0  DEGRADED  corrupted data
>>>>c0t600144F07A35001A5693A2810001d0  DEGRADED  corrupted data
>>>> 
>>>> device details:
>>>> 
>>>>c0t600144F07A350658569398F60001d0DEGRADED 
>>>> scrub/resilver needed
>>>>status: ZFS detected errors on this device.
>>>>The device is missing some data that is recoverable.
>>>> 
>>>>c0t600144F07A35066C5693A0D90001d0DEGRADED 
>>>> scrub/resilver needed
>>>>status: ZFS detected errors on this device.
>>>>The device is missing some data that

Re: [OmniOS-discuss] Error updating R018 to R020

2017-01-22 Thread Stephan Budach

e…

I forgot, that I had some older JRE installed, due to some application 
which doesn't run with newer ones. Undone that and the update to 020 
went just fine.


Sorry for the noise…

Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Error updating R018 to R020

2017-01-21 Thread Stephan Budach

Hi,

I just tried updating one of my 018 nides to 020 and got this:

DOWNLOAD  PAKETE   DATEIEN ÜBERTRAGUNG 
(MB) Geschwindigkeit
Abgeschlossen397/397   13177/13177 289.9/289.9  
1.4M/s


PHASE   ELEMENTE
Alte Aktionen werden entfernt  4421/4421
Neue Aktionen werden installiert   2354/4824Action install 
failed for 'usr/java/jre/lib/zi/Asia/Barnaul' 
(pkg://omnios/developer/java/jdk):
  ActionExecutionError: Angeforderter Vorgang für Paket 
pkg://omnios/developer/java/jdk@1.7.0.101.0,5.11-0.151020:20161101T233853Z 
fehlgeschlagen:
'/tmp/tmp4VWN53/usr/java/jre/lib/zi/Asia/Barnaul' kann nicht installiert 
werden; übergeordnetes Verzeichnis /tmp/tmp4VWN53/usr/java ist ein Link 
zu /tmp/tmp4VWN53/usr/java_1.7.0_11. Zum Fortsetzen verschieben Sie das 
Verzeichnis an seinen ursprünglichen Speicherort und versuchen es erneut.
 Das derzeit ausgeführte System wurde nicht verändert. Die Änderungen 
wurden lediglich an einem Klon vorgenommen. Dieser Klon ist in 
/tmp/tmp4VWN53 geladen, falls Sie ihn prüfen möchten.


pkg: Angeforderter Vorgang für Paket 
pkg://omnios/developer/java/jdk@1.7.0.101.0,5.11-0.151020:20161101T233853Z 
fehlgeschlagen:
'/tmp/tmp4VWN53/usr/java/jre/lib/zi/Asia/Barnaul' kann nicht installiert 
werden; übergeordnetes Verzeichnis /tmp/tmp4VWN53/usr/java ist ein Link 
zu /tmp/tmp4VWN53/usr/java_1.7.0_11. Zum Fortsetzen verschieben Sie das 
Verzeichnis an seinen ursprünglichen Speicherort und versuchen es erneut.

root@zfsha02gh79:/root#


Basically, pkg complains about JDK.17.0.101.0 being a link instead of a 
real folder and wants me to move that one back to its original location. 
Well… seems, that I can't do that… ;)


Cheers,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-19 Thread Stephan Budach

Am 18.01.17 um 17:38 schrieb Stephan Budach:

Am 18.01.17 um 17:32 schrieb Dan McDonald:

Generally the X540 has had a good track record.  I brought up the support for 
this a long time ago, and it worked alright then.  I think Dale has an X540 
in-house which works fine too (he should confirm this).

Some other things to check:

* Is your BIOS set to map the PCI-E space into the low-32 bits only?  That's an 
illumos limitation.

* Do you have other known-working 10GigBaseT chips to try?

Dan

I will check with the BIOS, altough I thought that this option would 
simply cause PCI adaptors to vanish from the system, if setup that way.
Actually, I have been going with Intel all the time and it has been up 
to the X540 in 10GbE setups only, when I ever startet to experience 
issues at all, so Intel has been a natural choice for me ever… ;)


Stephan
I just checked the BIOS of my new Supermicros and I think that this is 
the BIOS option you were referring to…


Above 4G Decoding: DISABLED

So, this should be right, shouldn't it?

Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Stephan Budach

Am 18.01.17 um 17:32 schrieb Dan McDonald:

Generally the X540 has had a good track record.  I brought up the support for 
this a long time ago, and it worked alright then.  I think Dale has an X540 
in-house which works fine too (he should confirm this).

Some other things to check:

* Is your BIOS set to map the PCI-E space into the low-32 bits only?  That's an 
illumos limitation.

* Do you have other known-working 10GigBaseT chips to try?

Dan

I will check with the BIOS, altough I thought that this option would 
simply cause PCI adaptors to vanish from the system, if setup that way.
Actually, I have been going with Intel all the time and it has been up 
to the X540 in 10GbE setups only, when I ever startet to experience 
issues at all, so Intel has been a natural choice for me ever… ;)


Stephan



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-18 Thread Stephan Budach

Am 18.01.17 um 09:01 schrieb Dale Ghent:

On Jan 18, 2017, at 2:38 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Am 17.01.17 um 23:09 schrieb Dale Ghent:

On Jan 17, 2017, at 2:39 PM, Stephan Budach <stephan.bud...@jvm.de>
  wrote:

Am 17.01.17 um 17:37 schrieb Dale Ghent:


On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de>

  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:



On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de>


  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?




Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale




do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…



Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale



The cables are actually specifically purchased cat6 cables. They run about 2m, 
not more. It could be tna cables, but I am running a couple of those and afaik, 
I only get these issues on these three nodes. I can try some other cables, but 
I hoped to be able to get maybe some kind of debug messages from the driver.


The chip provides no reason for a LoS or downgrade of the link. For configuration issues 
it interrupts only on a few things. "LSC" (Link Status Change) interrupts one 
of these things and are what tells the driver to interrogate the chip for its current 
speed, but beyond that, the hardware provides no further details. Any details regarding 
why the PHY had to re-train the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they separate PCIe 
cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 
might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt 
of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell 
CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the 
ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed 
including portions that touch the X540 due to the new X550 also being copper and the two 
models needing to share some logic relate

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Am 17.01.17 um 23:09 schrieb Dale Ghent:

On Jan 17, 2017, at 2:39 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Am 17.01.17 um 17:37 schrieb Dale Ghent:

On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de>
  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:


On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de>

  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?



Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale



do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…


Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale


The cables are actually specifically purchased cat6 cables. They run about 2m, 
not more. It could be tna cables, but I am running a couple of those and afaik, 
I only get these issues on these three nodes. I can try some other cables, but 
I hoped to be able to get maybe some kind of debug messages from the driver.

The chip provides no reason for a LoS or downgrade of the link. For configuration issues 
it interrupts only on a few things. "LSC" (Link Status Change) interrupts one 
of these things and are what tells the driver to interrogate the chip for its current 
speed, but beyond that, the hardware provides no further details. Any details regarding 
why the PHY had to re-train the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they separate PCIe 
cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 
might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt 
of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell 
CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the 
ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed 
including portions that touch the X540 due to the new X550 also being copper and the two 
models needing to share some logic related to that.

/dale
Thanks for clarifying that. I just checked the cables and they classify 
as Cat6a and they are from a respectable germ

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Am 17.01.17 um 17:37 schrieb Dale Ghent:

On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:

On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de>
  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?


Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale


do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…

Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale
The cables are actually specifically purchased cat6 cables. They run 
about 2m, not more. It could be tna cables, but I am running a couple of 
those and afaik, I only get these issues on these three nodes. I can try 
some other cables, but I hoped to be able to get maybe some kind of 
debug messages from the driver.


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:

On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?

Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale


do you know of any option to get to know, why three of my boxes are 
flapping their 10GbE ports? It's actually not only when in aggr mode, 
but on single use as well. Last week I presumeably had one of my RSF-1 
nodes panic, since it couldn't get to it's iSCSI LUNs anymore. The thing 
ist, that somewhere doen the line, the ixgbe driver seems to be fine, to 
configure one port to 1GbE instead of 10GbE, which will stop the 
flapping, but wich will break the VPC on my Nexus nevertheless.


In syslog, this looks like this:

Jan 17 14:41:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:42:11 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:43:33 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:43:33 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:43:34 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:43:43 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:44:05 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:44:10 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:14 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:14 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:29 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1 Mbps, full duplex
Jan 17 14:45:29 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe1 
link down
Jan 17 14:45:40 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:45 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:45:51 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:51 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:45:52 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:45:56 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 
link up, 1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 
link up, 1 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 
link down


Note on 14:46:07, where the system settles on a 1GbE connection…

Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2017-01-17 Thread Stephan Budach

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have 
three hosts running omniOS 018/020, which show these pesky issues with 
flapping their ixgbeN links on my Nexus FEXes…


Does anyone know, if there has any change been made to the ixgbe drivers 
since 06/2016?


Thanks,
Stephan


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] omniOS r018 crashed due to scsi/iSCSI issue

2017-01-12 Thread Stephan Budach
2 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539610 unix:_cmntrap+e6 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539780 scsi_vhci:vhci_scsi_reset_target+75 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f65397d0 scsi_vhci:vhci_recovery_reset+7d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539820 scsi_vhci:vhci_pathinfo_offline+e5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f65398c0 scsi_vhci:vhci_pathinfo_state_change+d5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539950 genunix:i_mdi_pi_state_change+16a ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539990 genunix:mdi_pi_offline+39 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539a20 iscsi:iscsi_lun_offline+b3 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539a60 iscsi:iscsi_sess_offline_luns+4d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539ab0 iscsi:iscsi_sess_state_logged_in+11e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539b00 iscsi:iscsi_sess_state_machine+13e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539b60 iscsi:iscsi_client_notify_task+17e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539c20 genunix:taskq_thread+2d0 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ff00f6539c30 unix:thread_start+8 ()

Jan 12 17:30:22 zfsha02gh79 unix: [ID 10 kern.notice]
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 672855 kern.notice] syncing 
file systems...

Jan 12 17:30:24 zfsha02gh79 genunix: [ID 904073 kern.notice]  done
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jan 12 17:30:22 zfsha02gh79 ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 0 reset port

Jan 12 17:48:01 zfsha02gh79 genunix: [ID 10 kern.notice]
Jan 12 17:48:02 zfsha02gh79 genunix: [ID 665016 kern.notice] ^M100% 
done: 4721646 pages dumped,


This happend on a rather higher load situation, when I was copying a 
200G file from a snapshot back to it's original place on its zvol, when 
this happened. Luckily these are RSF-1 nodes and the other one took over 
very quickliy, such as that my VM cluster didn't even seem to notice 
this issue. However, at that time I was conencted to the crashing host 
via ssh and my heart skipped a beat. ;)


As I have (unvoluntarily) freed this node of it's duties, I could jump 
to r020 on it, but I wonder if there has been any changes to the 
scsi_vhci layer at all in recent times…


Cheers,
Stephan

--
Krebs's 3 Basic Rules for Online Safety
1st - "If you didn't go looking for it, don't install it!"
2nd - "If you installed it, update it."
3rd - "If you no longer need it, remove it."
http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety


Stephan Budach
Head of IT
Jung von Matt AG
Glashüttenstraße 79
20357 Hamburg


Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.bud...@jvm.de
Internet: http://www.jvm.com
CiscoJabber Video: https://exp-e2.jvm.de/call/stephan.budach

Vorstand: Dr. Peter Figge, Jean-Remy von Matt, Larissa Pohl, Thomas Strerath, 
Götz Ulmer
Vorsitzender des Aufsichtsrates: Hans Hermann Münchmeyer
AG HH HRB 72893



smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Does anyone know about a 10GbE Quad-Port NIC for omniOS?

2017-01-11 Thread Stephan Budach
Yeah… I just found that one as well and it would actually suit me needs 
pretty well, I guess.


Thanks,
budy

Am 11.01.17 um 16:38 schrieb Dale Ghent:

Ah, another thing that I remembered - If you are using Supermicro server 
hardware that has a SIOM expansion slot, Supermicro has a quad 10Gb SIO module 
based on the X550 chip:

https://www.supermicro.com/products/accessories/addon/AOC-MTG-i4T.cfm

/dale


On Jan 11, 2017, at 10:34 AM, Dale Ghent <da...@omniti.com> wrote:


Since you mention X540, I suppose you mean you that you want twisted pair 10Gb 
ethernet ports.

You're probably looking at the Intel X710-T4. This will actually be driven by 
the i40e driver rather than the ixgbe driver as the MAC is from the 10/40Gb 
XL700 series but the 10Gb twisted pair PHY is the X557-AT4

I haven't actually used these, but they would appear to be 4 ports, twisted 
pair, and supportable under OmniOS.

/dale


On Jan 11, 2017, at 6:05 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi guys,

I am wondering, if anyone knows about a Quad-Port 10 GbE NIC, that is supported 
by omniOS? Of course, I could just stuff two X540s in the box, but maybe 
someone knows about a solid alternative.

Cheers,
budy
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Does anyone know about a 10GbE Quad-Port NIC for omniOS?

2017-01-11 Thread Stephan Budach

Hi guys,

I am wondering, if anyone knows about a Quad-Port 10 GbE NIC, that is 
supported by omniOS? Of course, I could just stuff two X540s in the box, 
but maybe someone knows about a solid alternative.


Cheers,
budy


smime.p7s
Description: S/MIME cryptographic signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] format doesn't show all disks in the system

2016-12-05 Thread Stephan Budach

Hi Dale,

Am 05.12.16 um 16:29 schrieb Dale Ghent:

What does running:

devfsadm -v

tell you? It could be that you added this drive and the dev links weren't made 
for some reason.

/dale


On Dec 5, 2016, at 7:41 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I do have two r018 systems, which are equipped with three different types of 
disks. When I run format it doesn't show all of the connected disks and leaves 
out the Intel 3700 SSDs.

OmniOS 5.11 omnios-r151018-ae3141d  April 2016
root@nfsvmpool06:/root# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c2t4d0 
  /pci@0,0/pci15d9,821@1f,2/disk@4,0
   1. c6t55CD2E404B42A367d0 
  /scsi_vhci/disk@g55cd2e404b42a367
   2. c6t5000C5008ED5D24Fd0 
  /scsi_vhci/disk@g5000c5008ed5d24f
   3. c6t5000C5008ED6CB33d0 
  /scsi_vhci/disk@g5000c5008ed6cb33
   4. c6t5000C5008ED44FE7d0 
  /scsi_vhci/disk@g5000c5008ed44fe7
   5. c6t50025388A06DB530d0 
  /scsi_vhci/disk@g50025388a06db530
   6. c6t50025388A0140B43d0 
  /scsi_vhci/disk@g50025388a0140b43
   7. c6t50025388A0140B48d0 
  /scsi_vhci/disk@g50025388a0140b48
Specify disk (enter its number):

However, cfgadm does show the Intel SSDs…

root@nfsvmpool06:/root# cfgadm -a
Ap_Id  Type Receptacle Occupant Condition
c3 scsi-sas connected unconfigured unknown
c4 scsi-sas connected unconfigured unknown
c5 scsi-sas connected configured   unknown
c5::w55cd2e404c0f0734,0disk-pathconnected configured   unknown
c7 scsi-sas connected configured   unknown
c7::w55cd2e404c0f0717,0disk-pathconnected configured   unknown
c8 scsi-sas connected configured   unknown
c8::w55cd2e404c0f0722,0disk-pathconnected configured   unknown
c9 scsi-sas connected configured   unknown
c9::w55cd2e404c0f06f1,0disk-pathconnected configured   unknown
c10scsi-sas connected configured   unknown
c10::w55cd2e404c0f072b,0   disk-pathconnected configured   unknown
c11scsi-sas connected configured   unknown
c11::w55cd2e404c0f06ff,0   disk-pathconnected configured   unknown
c12scsi-sas connected configured   unknown
c12::w55cd2e404c0f0744,0   disk-pathconnected configured   unknown
c13scsi-sas connected configured   unknown
c13::w55cd2e404c0f07ce,0   disk-pathconnected configured   unknown
c15scsi-sas connected configured   unknown
c15::w5000c5008ed5d24d,0   disk-pathconnected configured   unknown
c16scsi-sas connected configured   unknown
c16::w5000c5008ed44fe5,0   disk-pathconnected configured   unknown
c17scsi-sas connected configured   unknown
c17::w5000c5008ed6cb31,0   disk-pathconnected configured   unknown
c19scsi-sas connected configured   unknown
c19::w50025388a0140b48,0   disk-pathconnected configured   unknown
c20scsi-sas connected configured   unknown
c20::w50025388a06db530,0   disk-pathconnected configured   unknown
c21scsi-sas connected configured   unknown
c21::w50025388a0140b43,0   disk-pathconnected configured   unknown
c22scsi-sas connected configured   unknown
c22::w55cd2e404b42a367,0   disk-pathconnected configured   unknown

E.g. w50025388a06db530 is one of the Intel SSDs, what doesn't format doesnÄt 
show. What am I missing?

Thanks,
Stephan
___

devfsadm -v returns nothing

Thanks,
Stephan



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] format doesn't show all disks in the system

2016-12-05 Thread Stephan Budach

Hi,

I do have two r018 systems, which are equipped with three different 
types of disks. When I run format it doesn't show all of the connected 
disks and leaves out the Intel 3700 SSDs.


OmniOS 5.11 omnios-r151018-ae3141d  April 2016
root@nfsvmpool06:/root# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c2t4d0 
  /pci@0,0/pci15d9,821@1f,2/disk@4,0
   1. c6t55CD2E404B42A367d0 alt 2 hd 224 sec 56>

  /scsi_vhci/disk@g55cd2e404b42a367
   2. c6t5000C5008ED5D24Fd0 alt 2 hd 255 sec 252>

  /scsi_vhci/disk@g5000c5008ed5d24f
   3. c6t5000C5008ED6CB33d0 alt 2 hd 255 sec 252>

  /scsi_vhci/disk@g5000c5008ed6cb33
   4. c6t5000C5008ED44FE7d0 alt 2 hd 255 sec 252>

  /scsi_vhci/disk@g5000c5008ed44fe7
   5. c6t50025388A06DB530d0 2 hd 224 sec 168>

  /scsi_vhci/disk@g50025388a06db530
   6. c6t50025388A0140B43d0 2 hd 224 sec 168>

  /scsi_vhci/disk@g50025388a0140b43
   7. c6t50025388A0140B48d0 2 hd 224 sec 168>

  /scsi_vhci/disk@g50025388a0140b48
Specify disk (enter its number):

However, cfgadm does show the Intel SSDs…

root@nfsvmpool06:/root# cfgadm -a
Ap_Id  Type Receptacle Occupant 
Condition

c3 scsi-sas connected unconfigured unknown
c4 scsi-sas connected unconfigured unknown
c5 scsi-sas connected configured   unknown
c5::w55cd2e404c0f0734,0disk-pathconnected configured   unknown
c7 scsi-sas connected configured   unknown
c7::w55cd2e404c0f0717,0disk-pathconnected configured   unknown
c8 scsi-sas connected configured   unknown
c8::w55cd2e404c0f0722,0disk-pathconnected configured   unknown
c9 scsi-sas connected configured   unknown
c9::w55cd2e404c0f06f1,0disk-pathconnected configured   unknown
c10scsi-sas connected configured   unknown
c10::w55cd2e404c0f072b,0   disk-pathconnected configured   unknown
c11scsi-sas connected configured   unknown
c11::w55cd2e404c0f06ff,0   disk-pathconnected configured   unknown
c12scsi-sas connected configured   unknown
c12::w55cd2e404c0f0744,0   disk-pathconnected configured   unknown
c13scsi-sas connected configured   unknown
c13::w55cd2e404c0f07ce,0   disk-pathconnected configured   unknown
c15scsi-sas connected configured   unknown
c15::w5000c5008ed5d24d,0   disk-pathconnected configured   unknown
c16scsi-sas connected configured   unknown
c16::w5000c5008ed44fe5,0   disk-pathconnected configured   unknown
c17scsi-sas connected configured   unknown
c17::w5000c5008ed6cb31,0   disk-pathconnected configured   unknown
c19scsi-sas connected configured   unknown
c19::w50025388a0140b48,0   disk-pathconnected configured   unknown
c20scsi-sas connected configured   unknown
c20::w50025388a06db530,0   disk-pathconnected configured   unknown
c21scsi-sas connected configured   unknown
c21::w50025388a0140b43,0   disk-pathconnected configured   unknown
c22scsi-sas connected configured   unknown
c22::w55cd2e404b42a367,0   disk-pathconnected configured   unknown

E.g. w50025388a06db530 is one of the Intel SSDs, what doesn't format 
doesnÄt show. What am I missing?


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX for OmniOS update

2016-08-09 Thread Stephan Budach

Hi Dan,

I'd love to be able to actually spend any time on this, but my workload 
doesn't allow it… I hope to get into this at the end of september.

Don't give up on it, please… ;)

Cheers,
Stephan

Am 09.08.16 um 10:05 schrieb Peter Tribble:

Dan,

I've not heard from anyone, so I'm going to assume nobody has
played with LX zones on OmniOS yet.


Is there an ISO image to play with?

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Bloody Update for July 19th

2016-07-20 Thread Stephan Budach

Awesome! I will try that out asap.

Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Chasing down scsi-related warnings

2016-07-03 Thread Stephan Budach

Hi all,

I am having trouble chasing down some network or drive-related errors on 
one of my OmniOS r018 boxes. It started by me noticing these errors in 
the syslog on one of my RSF-1 nodes. These are just a few, but I found 
almost every drive/LUN of that target node mentioned in the syslogd on 
the RSF-1 node:


Jul  3 15:51:01 zfsha01colt scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f0564d504f4f4c3033534c3034 (sd4):

Jul  3 15:51:01 zfsha01colt incomplete write- retrying
Jul  3 15:51:29 zfsha01colt scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f0564d504f4f4c3033534c3035 (sd5):

Jul  3 15:51:29 zfsha01colt incomplete write- retrying
Jul  3 15:55:25 zfsha01colt scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f0564d504f4f4c3033534c3039 (sd6):

Jul  3 15:55:25 zfsha01colt incomplete write- retrying
Jul  3 16:06:43 zfsha01colt scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f0564d504f4f4c3033534c3135 (sd43):

Jul  3 16:06:43 zfsha01colt incomplete write- retrying

Also, iostat -exM is showing HW errors for those LUNs, although I can't 
confirm that the actual drives are at fault on the iSCSI target, which 
is provided by another OmniOS box.


I then failed the zpools over from that target to the second HA node and 
the errors went along with it, so I am assuming that these errors are 
either network related to the storage node or maybe even 
drive/controller related to the storage node. However, I can't seem to 
pin point the problem. As these are only warnings, there is no visisble 
sign about any issue on the storage node, but nonetheless I'd like to 
know, what the underlying issue is.


Any ideas, anyone?

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZPOOL disk performance on Supermicro with LSI 9300-8i

2016-06-23 Thread Stephan Budach

Am 23.06.16 um 14:10 schrieb qutic development:

Am 23.06.2016 um 01:06 schrieb Josh Barton :

Any ideas why the HP is so much faster? It just has one Smart Array Controller 
which I didn’t think would be faster than JBOD

Could you please provide a few more information about the servers? CPU speed, 
cores, RAM, etc?

OmniOS tuned or all raw installations? All the same OmniOS version?

- Stefan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Smells like caching on the HP…

You could only compare those results, if you'd stuff your LSIs into the 
HP as well. A magnitude is quite a lot in that regard. Then, what does 
this benchmark actually do? How much of it is CPU bound and how much is 
I/O bound? And I know, that the SmartArray often times has quite an 
amount of RAM cache on board, especially, if you buy one those bigger 
boxes, like a 380.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-06-03 Thread Stephan Budach

Am 03.06.16 um 15:42 schrieb Fábio Rabelo:

Hi to all

A question:

This are the board you used ?

https://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm

If so, this board uses Intel X540, and this issue are only with Intel
X550 chips !


Fábio Rabelo

Yes, this is the board I got. Actually, it's a  X10DRi-T4+

Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-06-03 Thread Stephan Budach

Hi Dale,

Am 17.05.16 um 20:55 schrieb Dale Ghent:

On May 17, 2016, at 8:30 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:


I have checked all of my ixgbe interfaces and they all report that now flow 
controll is in place, as you can see:

root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe0   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe1   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe2   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe3   flowctrlrw   no no no,tx,rx,bi

I then checked the ports on the Nexus switches and found out, that they do have 
outbound-flowcontrol enabled, but that is the case on any of those Nexus ports, 
including those, where this issue doesn't exist.

Optimally you would have flow control turned off on both sides, as the switch 
still expects the ixgbe NIC to respond appropriately. To be honest, the only 
time to use ethernet flow control is if you are operating the interfaces for 
higher-level protocols which do not provide any sort of direct flow control 
themselves, such as FCoE. If the vast majority of traffic is TCP, leave it to 
the TCP stack to manage any local congestion on the link.

/dale
I just wanted to wrap this up… I recently swapped that old Sun server 
with a new Supermicro X10-type, which has 4 10 GbE NICs on board, 
installed OmniOS r018 and my RSF-1 cluster software on it. Configured my 
two LACP aggregations and there hasn't been any issue since.
So, either it's something on the old server - it's a Sun Fire X4170M2 - 
or something on the Intel cards.


Cheers,
Stephan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Error when trying to install OmniOS r018 on new SM X100DRi-T4+

2016-05-27 Thread Stephan Budach

Am 27.05.16 um 15:18 schrieb Stephan Budach:

Hi,

I just tried to install OmniOS r018 onto a new SuperMicro server and 
when the install kernel starts up, I am getting this panic:


cpu1: featureset
WARNING: cpu1 feature mismatch

panic[cpu1/thread=f00f4920c40: unsupported mixed cpu monitor/nwait 
support

detected

The the hosts reboots…

Does anyone has an idead, if this issue is to be overcome by setting 
something in the BIOS?


Thanks,
Stephan


Weird… please ignore this… the box came with some non-standard BIOS settings. After 
hitting "Reset to optimized default" the box boots up r018 just fine.
Weird, since this is how the box has been delivered to me…

Cheers,
Stephan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Error when trying to install OmniOS r018 on new SM X100DRi-T4+

2016-05-27 Thread Stephan Budach

Hi,

I just tried to install OmniOS r018 onto a new SuperMicro server and 
when the install kernel starts up, I am getting this panic:


cpu1: featureset
WARNING: cpu1 feature mismatch

panic[cpu1/thread=f00f4920c40: unsupported mixed cpu monitor/nwait 
support

detected

The the hosts reboots…

Does anyone has an idead, if this issue is to be overcome by setting 
something in the BIOS?


Thanks,
Stephan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-17 Thread Stephan Budach

Am 11.05.16 um 19:28 schrieb Dale Ghent:

On May 11, 2016, at 12:32 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:
I will try to get one node free of all services running on it, as I will have 
to reboot the system, since I will have to change the ixgbe.conf, haven't I?
This is a RSF-1 host, so this will likely be done over the weekend.

You can use dladm on a live system:

dladm set-linkprop -p flowctrl=no ixgbeN

Where ixgbeN is your ixgbe interfaces (probably ixgbe0 and ixgbe1)

/dale

I have checked all of my ixgbe interfaces and they all report that now 
flow controll is in place, as you can see:


root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe0
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe0   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe1
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe1   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe2
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe2   flowctrlrw   no no no,tx,rx,bi
root@zfsha01colt:/root# dladm show-linkprop -p flowctrl ixgbe3
LINK PROPERTYPERM VALUE DEFAULTPOSSIBLE
ixgbe3   flowctrlrw   no no no,tx,rx,bi


I then checked the ports on the Nexus switches and found out, that they 
do have outbound-flowcontrol enabled, but that is the case on any of 
those Nexus ports, including those, where this issue doesn't exist.


Regards,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 16:48 schrieb Dale Ghent:

On May 11, 2016, at 7:36 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Am 09.05.16 um 20:43 schrieb Dale Ghent:

On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale

I have noticed that on prior versions of OmniOS as well, but we only recently 
started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our 
network. I will have to check if both links stay at 10GbE, when not being 
configured as a LACP bond. Let me check that tomorrow and report back. As we're 
heading for a streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, as they will 
all have to be conencted to both DCs. This is achieved by using VPCs on our 
Nexus switches.

Provide as much detail as you can - if you're using hw flow control, whether 
both links act this way at the same time or independently, and so-on. Problems 
like this often boil down to a very small and seemingly insignificant detail.

I currently have ixgbe on the operating table for adding X550 support, so I can 
take a look at this; however I don't have your type of switches available to me 
so LACP-specific testing is something I can't do for you.

/dale

I checked the ixgbe.conf files on each host and they all are still at the 
standard setting, which includes flow_control = 3;

As, so you are using ethernet flow control. Could you try disabling that on 
both sides (on the ixgbe host and on the switch) and see if that corrects the 
link stability issues? There's an outstanding issue with hw flow control on 
ixgbe that you *might* be running into regarding pause frame timing, which 
could manifest in the way you describe.

/dale

I will try to get one node free of all services running on it, as I will 
have to reboot the system, since I will have to change the ixgbe.conf, 
haven't I?

This is a RSF-1 host, so this will likely be done over the weekend.

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 14:50 schrieb Stephan Budach:

Am 11.05.16 um 13:36 schrieb Stephan Budach:

Am 09.05.16 um 20:43 schrieb Dale Ghent:
On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.bud...@jvm.de> 
wrote:


Am 09.05.16 um 16:33 schrieb Dale Ghent:
On May 9, 2016, at 8:24 AM, Stephan Budach 
<stephan.bud...@jvm.de> wrote:


Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d 
will break the LACP aggr-link on different boxes, when Intel 
X540-T2s are involved. It first starts with a couple if link 
downs/ups on one port and finally the link on that  port 
negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus 
switchports, but to no avail.


Anyone else noticed this and even better… knows a solution to this?
Was this an issue noticed only with r151018 and not with previous 
versions, or have you only tried this with 018?


By your description, I presume that the two ixgbe physical links 
will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?


/dale
I have noticed that on prior versions of OmniOS as well, but we 
only recently started deploying 10GbE LACP bonds, when we 
introduced our Nexus gear to our network. I will have to check if 
both links stay at 10GbE, when not being configured as a LACP bond. 
Let me check that tomorrow and report back. As we're heading for a 
streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, 
as they will all have to be conencted to both DCs. This is achieved 
by using VPCs on our Nexus switches.
Provide as much detail as you can - if you're using hw flow control, 
whether both links act this way at the same time or independently, 
and so-on. Problems like this often boil down to a very small and 
seemingly insignificant detail.


I currently have ixgbe on the operating table for adding X550 
support, so I can take a look at this; however I don't have your 
type of switches available to me so LACP-specific testing is 
something I can't do for you.


/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all 
of those ports are still on standard ethernet ports and modifications 
have only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the 
aggr and configure it as a standalone port. Then we will see, if it 
still receives the disconnects/reconnects and finally the negotiation 
to 1GbE instead of 10GbE. As this only seems to happen to the same 
port I never experienced other ports of the affected aggrs acting up. 
I also thought to notice, that those were always the "same" physical 
ports, that is the first port on the card (ixgbe0), but that might of 
course be a coincidence.


Thanks,
Stephan


Ok, so we can likely rule out LACP as a generic reason for this issue… 
After removing ixgbe0 from the aggr1, I plugged it into an unused port 
of my Nexus FEX and low and behold, here we go:


root@tr1206902:/root# tail -f /var/adm/messages
May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1000 Mbps, full duplex
May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 
link down
May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1 Mbps, full duplex


May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 
link down
May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 
link up, 1 Mbps, full duplex


So, after less than an hour, we had the first link-cycle on ixgbe0, 
alas on another port, which has no LACP config whatsoever. I will 
monitor this for a while and see, if we will get more of those.


Thanks,
Stephan 


Ehh… and sorry, I almost forgot to paste the log from the Cisco Nexus 
switch:


2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-SPEED: Interface 
Ethernet141/1/9, operational speed changed to 10 Gbps
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Interface 
Ethernet141/1/9, operational duplex mode changed to Full
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface 
Ethernet141/1/9, operational Receive Flow Control state changed to off
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface 
Ethernet141/1/9, operational Transmit Flow Control state changed to on
2016 May 11 13:21:22 gh79-nx-01 %ETHPORT-5-IF_UP: Interface 
Ethernet141/1/9 is up in mode access
2016 May 11 14:07:29 gh79-nx-01 %ETHPORT-5-IF_DOWN_LINK_FAILURE: 
Interface Ethernet141/1/9 is down (Link failure)

2016 May 11 14:07:45 gh79-nx-01 last message repeated 1 time
2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-SPEED: Interface 
Ethernet141/1/9, operational speed changed to 10 Gbps
2016 May 11 14:07:45 gh79-nx-01 %ETHPORT-5-IF_DUPLEX: Inte

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 11.05.16 um 13:36 schrieb Stephan Budach:

Am 09.05.16 um 20:43 schrieb Dale Ghent:
On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.bud...@jvm.de> 
wrote:


Am 09.05.16 um 16:33 schrieb Dale Ghent:
On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.bud...@jvm.de> 
wrote:


Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d 
will break the LACP aggr-link on different boxes, when Intel 
X540-T2s are involved. It first starts with a couple if link 
downs/ups on one port and finally the link on that  port 
negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus 
switchports, but to no avail.


Anyone else noticed this and even better… knows a solution to this?
Was this an issue noticed only with r151018 and not with previous 
versions, or have you only tried this with 018?


By your description, I presume that the two ixgbe physical links 
will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?


/dale
I have noticed that on prior versions of OmniOS as well, but we only 
recently started deploying 10GbE LACP bonds, when we introduced our 
Nexus gear to our network. I will have to check if both links stay 
at 10GbE, when not being configured as a LACP bond. Let me check 
that tomorrow and report back. As we're heading for a streched DC, 
we are mainly configuring 2-way LACP bonds over our Nexus gear, so 
we don't actually have any single 10GbE connection, as they will all 
have to be conencted to both DCs. This is achieved by using VPCs on 
our Nexus switches.
Provide as much detail as you can - if you're using hw flow control, 
whether both links act this way at the same time or independently, 
and so-on. Problems like this often boil down to a very small and 
seemingly insignificant detail.


I currently have ixgbe on the operating table for adding X550 
support, so I can take a look at this; however I don't have your type 
of switches available to me so LACP-specific testing is something I 
can't do for you.


/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all of 
those ports are still on standard ethernet ports and modifications 
have only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the aggr 
and configure it as a standalone port. Then we will see, if it still 
receives the disconnects/reconnects and finally the negotiation to 
1GbE instead of 10GbE. As this only seems to happen to the same port I 
never experienced other ports of the affected aggrs acting up. I also 
thought to notice, that those were always the "same" physical ports, 
that is the first port on the card (ixgbe0), but that might of course 
be a coincidence.


Thanks,
Stephan


Ok, so we can likely rule out LACP as a generic reason for this issue… 
After removing ixgbe0 from the aggr1, I plugged it into an unused port 
of my Nexus FEX and low and behold, here we go:


root@tr1206902:/root# tail -f /var/adm/messages
May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1000 Mbps, full duplex
May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1 Mbps, full duplex


May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1 Mbps, full duplex


So, after less than an hour, we had the first link-cycle on ixgbe0, alas 
on another port, which has no LACP config whatsoever. I will monitor 
this for a while and see, if we will get more of those.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-11 Thread Stephan Budach

Am 09.05.16 um 20:43 schrieb Dale Ghent:

On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale

I have noticed that on prior versions of OmniOS as well, but we only recently 
started deploying 10GbE LACP bonds, when we introduced our Nexus gear to our 
network. I will have to check if both links stay at 10GbE, when not being 
configured as a LACP bond. Let me check that tomorrow and report back. As we're 
heading for a streched DC, we are mainly configuring 2-way LACP bonds over our 
Nexus gear, so we don't actually have any single 10GbE connection, as they will 
all have to be conencted to both DCs. This is achieved by using VPCs on our 
Nexus switches.

Provide as much detail as you can - if you're using hw flow control, whether 
both links act this way at the same time or independently, and so-on. Problems 
like this often boil down to a very small and seemingly insignificant detail.

I currently have ixgbe on the operating table for adding X550 support, so I can 
take a look at this; however I don't have your type of switches available to me 
so LACP-specific testing is something I can't do for you.

/dale
I checked the ixgbe.conf files on each host and they all are still at 
the standard setting, which includes flow_control = 3;
So they all have flow control enabled. As for the Nexus config, all of 
those ports are still on standard ethernet ports and modifications have 
only been made globally to the switch.
I will now have to yank the one port on one of the hosts from the aggr 
and configure it as a standalone port. Then we will see, if it still 
receives the disconnects/reconnects and finally the negotiation to 1GbE 
instead of 10GbE. As this only seems to happen to the same port I never 
experienced other ports of the affected aggrs acting up. I also thought 
to notice, that those were always the "same" physical ports, that is the 
first port on the card (ixgbe0), but that might of course be a coincidence.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Stephan Budach

Am 09.05.16 um 16:33 schrieb Dale Ghent:

On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will break the 
LACP aggr-link on different boxes, when Intel X540-T2s are involved. It first 
starts with a couple if link downs/ups on one port and finally the link on that 
 port negiotates to 1GbE instead of 10GbE, which then breaks the LACP channel 
on my Cisco Nexus for this connection.

I have tried swapping and interchangeing cables and thus switchports, but to no 
avail.

Anyone else noticed this and even better… knows a solution to this?

Was this an issue noticed only with r151018 and not with previous versions, or 
have you only tried this with 018?

By your description, I presume that the two ixgbe physical links will stay at 
10Gb and not bounce down to 1Gb if not LACP'd together?

/dale
I have noticed that on prior versions of OmniOS as well, but we only 
recently started deploying 10GbE LACP bonds, when we introduced our 
Nexus gear to our network. I will have to check if both links stay at 
10GbE, when not being configured as a LACP bond. Let me check that 
tomorrow and report back. As we're heading for a streched DC, we are 
mainly configuring 2-way LACP bonds over our Nexus gear, so we don't 
actually have any single 10GbE connection, as they will all have to be 
conencted to both DCs. This is achieved by using VPCs on our Nexus switches.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

2016-05-09 Thread Stephan Budach

Hi,

I have a strange behaviour where OmniOS omnios-r151018-ae3141d will 
break the LACP aggr-link on different boxes, when Intel X540-T2s are 
involved. It first starts with a couple if link downs/ups on one port 
and finally the link on that  port negiotates to 1GbE instead of 10GbE, 
which then breaks the LACP channel on my Cisco Nexus for this connection.


I have tried swapping and interchangeing cables and thus switchports, 
but to no avail.


Anyone else noticed this and even better… knows a solution to this?

Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] R151018: kernel panic when iSCSI target goes south

2016-04-25 Thread Stephan Budach

Hi Dan,

Am 25.04.16 um 16:23 schrieb Dan McDonald:

This one is a NULL pointer dereference. If you're still running with kmem_flags 
= 0xf, the dump will be especially useful.

Dan

Sent from my iPhone (typos, autocorrect, and all)


On Apr 25, 2016, at 3:27 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have been struck by kernel panics on my OmniOS boxes lateley, when any one of 
the target hosts, where the system get it's LUNs from, experiences a kernel 
panic itself. When this happens, my RSF-1 node immediately panics as well. 
Looking at the vmdump, it shows this:

root@zfsha02gh79:/var/crash/unknown# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix 
scsi_vhci zfs sata sd ip hook neti sockfs arp usba stmf stmf_sbd mm md lofs 
random idm crypto cpc kvm ufs logindmux nsmb ptm smbsrv nfs ipc mpt mpt_sas 
pmcs emlxs ]

::status

debugging crash dump vmcore.0 (64-bit) from zfsha02gh79
operating system: 5.11 omnios-r151018-ae3141d (i86pc)
image uuid: 18d57565-8b91-46ea-9469-fb0518d35e30
panic message: BAD TRAP: type=e (#pf Page fault) rp=ff00f8b5e590 addr=10 occurred in 
module "scsi_vhci" due to a NULL pointer dereference
dump content: kernel pages only

::stack

vhci_scsi_reset_target+0x75(ff2c7b200b88, 1, 1)
vhci_recovery_reset+0x7d(ff2c7ac9d080, ff2c7b200b88, 1, 2)
vhci_pathinfo_offline+0xe5(ff21d3288550, ff2273530838, 0)
vhci_pathinfo_state_change+0xd5(ff21d3288550, ff2273530838, 4, 0, 0)
i_mdi_pi_state_change+0x16a(ff2273530838, 4, 0)
mdi_pi_offline+0x39(ff2273530838, 0)
iscsi_lun_offline+0xb3(ff21f1bd4580, ff2c084f5d60, 0)
iscsi_sess_offline_luns+0x4d(ff27fea82000)
iscsi_sess_state_failed+0x6f(ff27fea82000, 3, 2a)
iscsi_sess_state_machine+0x156(ff27fea82000, 3, 2a)
iscsi_login_end+0x18f(ff286c8d6000, 15, ff22724e1158)
iscsi_login_start+0x318(ff22724e1158)
taskq_thread+0x2d0(ff2270a7cb50)
thread_start+8()
The vmdump is really big, approx 5GB compressed, but I could share that if 
necessary.

Thanks,
Stephan


I sure do, if you'd grant me an upload token, I will upload that zip 
file of 4GB. This will expand to a 18GB vmdump…


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] R151018: kernel panic when iSCSI target goes south

2016-04-25 Thread Stephan Budach

Hi,

I have been struck by kernel panics on my OmniOS boxes lateley, when any 
one of the target hosts, where the system get it's LUNs from, 
experiences a kernel panic itself. When this happens, my RSF-1 node 
immediately panics as well. Looking at the vmdump, it shows this:


root@zfsha02gh79:/var/crash/unknown# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix 
scsi_vhci zfs sata sd ip hook neti sockfs arp usba stmf stmf_sbd mm md 
lofs random idm crypto cpc kvm ufs logindmux nsmb ptm smbsrv nfs ipc mpt 
mpt_sas pmcs emlxs ]

> ::status
debugging crash dump vmcore.0 (64-bit) from zfsha02gh79
operating system: 5.11 omnios-r151018-ae3141d (i86pc)
image uuid: 18d57565-8b91-46ea-9469-fb0518d35e30
panic message: BAD TRAP: type=e (#pf Page fault) rp=ff00f8b5e590 
addr=10 occurred in module "scsi_vhci" due to a NULL pointer dereference

dump content: kernel pages only
> ::stack
vhci_scsi_reset_target+0x75(ff2c7b200b88, 1, 1)
vhci_recovery_reset+0x7d(ff2c7ac9d080, ff2c7b200b88, 1, 2)
vhci_pathinfo_offline+0xe5(ff21d3288550, ff2273530838, 0)
vhci_pathinfo_state_change+0xd5(ff21d3288550, ff2273530838, 4, 0, 0)
i_mdi_pi_state_change+0x16a(ff2273530838, 4, 0)
mdi_pi_offline+0x39(ff2273530838, 0)
iscsi_lun_offline+0xb3(ff21f1bd4580, ff2c084f5d60, 0)
iscsi_sess_offline_luns+0x4d(ff27fea82000)
iscsi_sess_state_failed+0x6f(ff27fea82000, 3, 2a)
iscsi_sess_state_machine+0x156(ff27fea82000, 3, 2a)
iscsi_login_end+0x18f(ff286c8d6000, 15, ff22724e1158)
iscsi_login_start+0x318(ff22724e1158)
taskq_thread+0x2d0(ff2270a7cb50)
thread_start+8()
>

The vmdump is really big, approx 5GB compressed, but I could share that 
if necessary.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-22 Thread Stephan Budach

Am 22.04.16 um 19:28 schrieb Dan McDonald:

On Apr 22, 2016, at 1:13 PM, Richard Elling  
wrote:

If you're running Solaris 11 or pre-2015 OmniOS, then the old write throttle is 
impossible
to control and you'll chase your tail trying to balance scrubs/resilvers 
against any other
workload. From a control theory perspective, it is unstable.

pre-2015 can be clarified a bit:  r151014 and later has the modern ZFS write 
throttle.  Now I know that Stephen is running later versions of OmniOS, so you 
can be guaranteed it's the modern write-throttle.

Furthermore, anyone running any OmniOS EARLIER than r151014 is not supportable, 
and any pre-014 release is not supported.

Dan

…and I am actually fine with the new controls/tunables, so there's 
aboslutely no fuss here. ;) Plus, I actually understood, how both work, 
which is a plus…


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-21 Thread Stephan Budach

Am 19.04.16 um 23:31 schrieb wuffers:

You might want to check this old thread:

http://lists.omniti.com/pipermail/omnios-discuss/2014-July/002927.html

Richard Elling had some interesting insights on how the scrub works:

"So I think the pool is not scheduling scrub I/Os very well. You can 
increase the number of
scrub I/Os in the scheduler by adjusting the zfs_vdev_scrub_max_active 
tunable. The
default is 2, but you'll have to consider that a share (in the stock 
market sense) where
the active sync reads and writes are getting 10 each. You can try 
bumping up the value
and see what happens over some time, perhaps 10 minutes or so -- too 
short of a time
and you won't get a good feeling for the impact (try this in off-peak 
time).

echo zfs_vdev_scrub_max_active/W0t5 | mdb -kw
will change the value from 2 to 5, increasing its share of the total 
I/O workload.


You can see the progress of scan (scrubs do scan) workload by looking 
at the ZFS

debug messages.
echo ::zfs_dbgmsg | mdb -k
These will look mysterious... they are. But the interesting bits are 
about how many blocks
are visited in some amount of time (txg sync interval). Ideally, this 
will change as you

adjust zfs_vdev_scrub_max_active."

I had to increase my zfs_vdev_scrub_max_active parameter higher than 
5, but it sounds like the default setting for that tunable is no 
longer satisfactory for today's high performance systems.


On Sun, Apr 17, 2016 at 4:07 PM, Stephan Budach <stephan.bud...@jvm.de 
<mailto:stephan.bud...@jvm.de>> wrote:


Am 17.04.16 um 20:42 schrieb Dale Ghent:

On Apr 17, 2016, at 9:07 AM, Stephan Budach
<stephan.bud...@jvm.de <mailto:stephan.bud...@jvm.de>> wrote:

Well… searching the net somewhat more thoroughfully, I
came across an archived discussion which deals also with a
similar issue. Somewhere down the conversation, this
parameter got suggested:

echo "zfs_scrub_delay/W0" | mdb -kw

I just tried that as well and although the caculated speed
climbs rathet slowly up, iostat now shows approx. 380 MB/s
read from the devices, which rates at 24 MB/s per single
device * 8 *2.

Being curious, I issued a echo "zfs_scrub_delay/W1" | mdb
-kw to see what would happen and that command immediately
drowned the rate on each device down to 1.4 MB/s…

What is the rational behind that? Who wants to wait for
weeks for a scrub to finish? Usually, I am having znapzend
run as well, creating snapshots on a regular basis.
Wouldn't that hurt scrub performance even more?

zfs_scrub_delay is described here:


http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#63

How busy are your disks if you subtract the IO caused by a
scrub? Are you doing these scrubs with your VMs causing normal
IO as well?

Scrubbing, overall, is treated as a background maintenance
process. As such, it is designed to not interfere with
"production IO" requests. It used to be that scrubs ran as
fast as disk IO and bus bandwidth would allow, which in turn
severely impacted the IO performance of running applications,
and in some cases this would cause problems for production or
user services.  The scrub delay setting which you've
discovered is the main governor of this scrub throttle
code[1], and by setting it to 0, you are effectively removing
the delay it imposes on itself to allow non-scrub/resilvering
IO requests to finish.

The solution in your case is specific to yourself and how you
operate your servers and services. Can you accept degraded
application IO while a scrub or resilver is running? Can you
not? Maybe only during certain times?

/dale

[1]

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#1841

I do get the notion if this, but if the increase from 0 to 1
reduces the throughput from 24Mb/s to 1MB/s, this seems way
overboard to me. Having to wait for a couple of hours when running
with 0 as opposed to days (up to 10) when running at 1  - on a 1.3
TB zpool - doesn't seem to be the right choice. If this tunable
offered some more room for choice, that would be great, but it
obviously doesn't.

It's the weekend and my VMs aren't excatly hogging their disks, so
there was plenty of I/O available… I'd wish for a more granular
setting regarding this setting.

Anyway, the scrub finished a couple of hours later and of course,
I can always set this tunable to 0, should I need it,

Thanks,

Stephan



Interesting read - and it surely works. If you set the tunable before 
you start th

Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-17 Thread Stephan Budach

Am 17.04.16 um 20:42 schrieb Dale Ghent:

On Apr 17, 2016, at 9:07 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Well… searching the net somewhat more thoroughfully, I came across an archived 
discussion which deals also with a similar issue. Somewhere down the 
conversation, this parameter got suggested:

echo "zfs_scrub_delay/W0" | mdb -kw

I just tried that as well and although the caculated speed climbs rathet slowly 
up, iostat now shows  approx. 380 MB/s read from the devices, which rates  at 
24 MB/s per single device * 8 *2.

Being curious, I issued a echo "zfs_scrub_delay/W1" | mdb -kw to see what would 
happen and that command immediately drowned the rate on each device down to 1.4 MB/s…

What is the rational behind that? Who wants to wait for weeks for a scrub to 
finish? Usually, I am having znapzend run as well, creating snapshots on a 
regular basis. Wouldn't that hurt scrub performance even more?

zfs_scrub_delay is described here:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#63

How busy are your disks if you subtract the IO caused by a scrub? Are you doing 
these scrubs with your VMs causing normal IO as well?

Scrubbing, overall, is treated as a background maintenance process. As such, it is 
designed to not interfere with "production IO" requests. It used to be that 
scrubs ran as fast as disk IO and bus bandwidth would allow, which in turn severely 
impacted the IO performance of running applications, and in some cases this would cause 
problems for production or user services.  The scrub delay setting which you've 
discovered is the main governor of this scrub throttle code[1], and by setting it to 0, 
you are effectively removing the delay it imposes on itself to allow 
non-scrub/resilvering IO requests to finish.

The solution in your case is specific to yourself and how you operate your 
servers and services. Can you accept degraded application IO while a scrub or 
resilver is running? Can you not? Maybe only during certain times?

/dale

[1] 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#1841
I do get the notion if this, but if the increase from 0 to 1 reduces the 
throughput from 24Mb/s to 1MB/s, this seems way overboard to me. Having 
to wait for a couple of hours when running with 0 as opposed to days (up 
to 10) when running at 1  - on a 1.3 TB zpool - doesn't seem to be the 
right choice. If this tunable offered some more room for choice,  that 
would be great, but it obviously doesn't.


It's the weekend and my VMs aren't excatly hogging their disks, so there 
was plenty of I/O available… I'd wish for a more granular setting 
regarding this setting.


Anyway, the scrub finished a couple of hours later and of course, I can 
always set this tunable to 0, should I need it,


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-17 Thread Stephan Budach

Hi all,

I am running a scrub on a SSD-only zpool on r018. This zpool consists of 
16 iSCSI targets, which are served from two other OmniOS boxes - 
currently still running r016 over 10GbE connections.


This zpool serves as a NFS share for my Oracle VM cluster and it 
delivers reasonable performance. Even while the scrub is running, I can 
get approx 1200MB/s throughput when dd'ing a vdisk from the ZFS to 
/dev/null.


However, the running scrub is only progressing like this:

root@zfsha02gh79:/root# zpool status ssdTank
  pool: ssdTank
 state: ONLINE
  scan: scrub in progress since Sat Apr 16 23:37:52 2016
68,5G scanned out of 1,36T at 1,36M/s, 276h17m to go
0 repaired, 4,92% done
config:

NAME STATE READ WRITE CKSUM
ssdTank ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c3t600144F090D0961356B8A76C0001d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A93C0009d0 ONLINE   0 
0 0

  mirror-1 ONLINE   0 0 0
c3t600144F090D0961356B8A7BE0002d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A948000Ad0 ONLINE   0 
0 0

  mirror-2 ONLINE   0 0 0
c3t600144F090D0961356B8A7F10003d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A958000Bd0 ONLINE   0 
0 0

  mirror-3 ONLINE   0 0 0
c3t600144F090D0961356B8A7FC0004d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A964000Cd0 ONLINE   0 
0 0

  mirror-4 ONLINE   0 0 0
c3t600144F090D0961356B8A8210005d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A96E000Dd0 ONLINE   0 
0 0

  mirror-5 ONLINE   0 0 0
c3t600144F090D0961356B8A82E0006d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A978000Ed0 ONLINE   0 
0 0

  mirror-6 ONLINE   0 0 0
c3t600144F090D0961356B8A83B0007d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A983000Fd0 ONLINE   0 
0 0

  mirror-7 ONLINE   0 0 0
c3t600144F090D0961356B8A84A0008d0 ONLINE   0 
0 0
c3t600144F090D0961356B8A98E0010d0 ONLINE   0 
0 0


errors: No known data errors

These are all Intel S3710s with 800GB and I can't seem to find out why 
it's moving so slowly.

Anything I can look at specifically?

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Panic when trying to remove an unused LUN with stmfadm

2016-04-16 Thread Stephan Budach

Hi,

I have experienced this issue a couple of times now. First in r016 but 
just today on r018, too. When trying to remove a LUN by issueing 
something like


root@nfsvmpool05:/root# stmfadm delete-lu 600144F04E4653564D504F4F4C303538
packet_write_wait: Connection to 10.11.14.49: Broken pipe
Shared connection to nfsvmpool05 closed.

the system hung up. When it came back online, I was able to remove that 
LUN without any issue. The fun thing is, that I created that LUN just 
before and it hadn't even been in use, as it hadn't been attached to any 
view.


The syslog shows the usual COMSTAR thingy about kernel heap corruption, 
which I encountered already a couple of times, although in rather normal 
operation mode.


Apr 16 10:17:15 nfsvmpool05 genunix: [ID 478202 kern.notice] kernel 
memory allocator:

Apr 16 10:17:15 nfsvmpool05 unix: [ID 836849 kern.notice]
Apr 16 10:17:15 nfsvmpool05 ^Mpanic[cpu6]/thread=ff0e495c4880:
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 812275 kern.notice] kernel heap 
corruption detected

Apr 16 10:17:15 nfsvmpool05 unix: [ID 10 kern.notice]
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 802836 kern.notice] 
ff003df44ae0 fba4e8d4 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44b20 genunix:kmem_free+1a8 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44b60 stmf:stmf_deregister_lu+1a7 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44ba0 stmf_sbd:sbd_delete_locked_lu+95 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44c00 stmf_sbd:sbd_delete_lu+a9 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44c80 stmf_sbd:stmf_sbd_ioctl+292 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44cc0 genunix:cdev_ioctl+39 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44d10 specfs:spec_ioctl+60 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44da0 genunix:fop_ioctl+55 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44ec0 genunix:ioctl+9b ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44f10 unix:brand_sys_sysenter+1c9 ()

Apr 16 10:17:15 nfsvmpool05 unix: [ID 10 kern.notice]
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 672855 kern.notice] syncing 
file systems...

Apr 16 10:17:15 nfsvmpool05 genunix: [ID 904073 kern.notice]  done
Apr 16 10:17:16 nfsvmpool05 genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Apr 16 10:17:16 nfsvmpool05 ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 1 reset port

Apr 16 10:20:46 nfsvmpool05 genunix: [ID 10 kern.notice]
Apr 16 10:20:46 nfsvmpool05 genunix: [ID 665016 kern.notice] ^M100% 
done: 1153955 pages dumped,

Apr 16 10:20:46 nfsvmpool05 genunix: [ID 851671 kern.notice] dump succeeded

Fortuanetly this time, there has been a debug kernel running, as Dan 
suggested on former occurances and I do have a dump at hand, which I 
could upload to uploads.omniti.com, if I'd get a token to do so.


@Dan, may I get one?

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS r151018 is now out!

2016-04-15 Thread Stephan Budach

Hi Dan,

Am 15.04.16 um 20:20 schrieb Dan McDonald:

Follow the "change to openssh or sunssh" instructions on the 016 release notes. 
 You appear to have conflicting packages, one from each, which was a bug in the 016 
installer.

Or you can add "--exclude ssh-common" to your pkg update.

Dan

Sent from my iPhone (typos, autocorrect, and all)


On Apr 15, 2016, at 12:57 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi Dan,

I actually ran into this issue, when tying to upgrade my 016 to 018:

root@zfsha01colt:/root# pkg update -v --be-name=OmniOS-r151018 entire
Creating Plan (Es wird auf widersprüchliche Aktionen geprüft): /
pkg update: Folgende Pakete stellen widersprüchliche Aktionstypen in 
usr/share/man/man4/ssh_config.4 bereit:

  link:
pkg://omnios/service/network/ssh-common@0.5.11,5.11-0.151018:20160412T195038Z
  file:
pkg://omnios/network/openssh@7.1.2,5.11-0.151016:20160114T155110Z


How can I resolve this issue?

Thanks,
Stephan
___



thanks, I already thought to try that and now both of my RSF-1 hosts are 
happy on r018.


Cheers,
Stephan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS r151018 is now out!

2016-04-15 Thread Stephan Budach

Hi Dan,

I actually ran into this issue, when tying to upgrade my 016 to 018:

root@zfsha01colt:/root# pkg update -v --be-name=OmniOS-r151018 entire
Creating Plan (Es wird auf widersprüchliche Aktionen geprüft): /
pkg update: Folgende Pakete stellen widersprüchliche Aktionstypen in 
usr/share/man/man4/ssh_config.4 bereit:


  link:
pkg://omnios/service/network/ssh-common@0.5.11,5.11-0.151018:20160412T195038Z
  file:
pkg://omnios/network/openssh@7.1.2,5.11-0.151016:20160114T155110Z


How can I resolve this issue?

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] RSF-1/ZFS panics node when offlining one iSCSI storage mirror

2016-03-07 Thread Stephan Budach

Hi Dan,

Am 07.03.16 um 15:41 schrieb Dan McDonald:

On Mar 6, 2016, at 9:44 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

when I noted that one node would panic,

AS A RULE -- if you have an OmniOS box panic, you should save off the corefile 
(vmdump.N) and be able to share it with the list.  I understand this may be an 
RSF-1 panic, BUT if it's not, it'd be nice to know.

You can upload it to uploads.omniti.com if you wish, just request an upload 
token.

Dan

thanks - I will keep that in mind and I actually had a core dump 
available, but since I was testing around, I didn't mean to occupy 
anyone's time more than absolutely necessary and so I dumoed them.
Speaking of that incident, I have lowered the iSCSI connection timeout 
to 60s, which seems to be the lowest value supported by issueing a


iscsiadm modify initiator-node -T conn-login-max=60

and afterwards I used stmfadm offline target on the storage node to cut 
the target off. This time, the initiator timed out after 60s and that 
particular zpool changed it's status to degraded without anything 
happening. I still have to test that under load, but I will probably 
push that to next weekend.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] RSF-1/ZFS panics node when offlining one iSCSI storage mirror

2016-03-06 Thread Stephan Budach

Hi,

I have set up a rather simple RSF-1 project, where two RSF-1 nodes 
connect to two storage heads via iSCSI. I have deployed one network and 
two disc heatbeats and I was trying all sorts of possible failures, when 
I noted that one node would panic, if I offlined an iSCSI target on one 
storage node and thus shutting down one side of a zpool mirror 
completely. Issueing a zpool status would't return and after a while the 
host got nuked.


I then onlined the target again and waited until the node returned and 
than removed the local iSCSI initiator on the RSF-1 node instead, which 
resulted in a degraded, but functional zpool and this time, the node 
didn't get nuked.


What is the difference between these two approaches and can I setup my 
systems such as that offlining a target doesn't lead to this behaviour? 
I'd imagine, that a target failure might as well occur as any other 
sofware fault.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Stephan Budach

Am 18.02.16 um 21:57 schrieb Schweiss, Chip:



On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen <m...@miras.org 
<mailto:m...@miras.org>> wrote:


On Thu, 18 Feb 2016 07:13:36 +0100
Stephan Budach <stephan.bud...@jvm.de
<mailto:stephan.bud...@jvm.de>> wrote:

>
> So, when I issue a simple ls -l on the folder of the vdisks,
while the switchover is happening, the command somtimes comcludes
in 18 to 20 seconds, but sometime ls will just sit there for minutes.
>
This is a known limitation in NFS. NFS was never intended to be
clustered so what you experience is the NFS process on the client side
keeps kernel locks for the now unavailable NFS server and any request
to the process hangs waiting for these locks to be resolved. This can
be compared to a situation where you hot-swap a drive in the pool
without notifying the pool.

Only way to resolve this is to forcefully kill all NFS client
processes
and the restart the NFS client.


I've been running RSF-1 on OmniOS since about r151008. All my clients 
have always been NFSv3 and NFSv4.


My memory is a bit fuzzy, but when I first started testing RSF-1, 
OmniOS still had the Sun lock manager which was later replaced with 
the BSD lock manager.   This has had many difficulties.


I do remember that fail overs when I first started with RSF-1 never 
had these stalls, I believe this was because the lock state was stored 
in the pool and the server taking over the pool would inherit that 
state too.   That state is now lost when a pool is imported with the 
BSD lock manager.


When I did testing I would do both full speed reading and writing to 
the pool and force fail overs, both by command line and by killing 
power on the active server. Never did I have a fail over take more 
than about 30 seconds for NFS to fully resume data flow.


Others who know more about the BSD lock manager vs the old Sun lock 
manager may be able to tell us more.  I'd also be curious if Nexenta 
has addressed this.


-Chip
I actually don't know, if it's the lock manager or the nfsd itself, that 
caused this, but as I bounced all of them after I failed the ZPOOL over 
while hammering it with reads and writes, lockd would also have been 
part of the processes that had been restarted. And remeber, this only 
happend when failing from to and back one host in a rather quick manner.


Nevertheless, RSF-1 seems to be a solid solution and I will very likely 
implement it across several OmniOS boxes.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Stephan Budach

Am 18.02.16 um 22:56 schrieb Richard Elling:

comments below...

On Feb 18, 2016, at 12:57 PM, Schweiss, Chip <c...@innovates.com 
<mailto:c...@innovates.com>> wrote:




On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen<m...@miras.org 
<mailto:m...@miras.org>>wrote:


On Thu, 18 Feb 2016 07:13:36 +0100
Stephan Budach <stephan.bud...@jvm.de
<mailto:stephan.bud...@jvm.de>> wrote:

>
> So, when I issue a simple ls -l on the folder of the vdisks,
while the switchover is happening, the command somtimes comcludes
in 18 to 20 seconds, but sometime ls will just sit there for minutes.
>
This is a known limitation in NFS. NFS was never intended to be
clustered so what you experience is the NFS process on the client
side
keeps kernel locks for the now unavailable NFS server and any request
to the process hangs waiting for these locks to be resolved. This can
be compared to a situation where you hot-swap a drive in the pool
without notifying the pool.

Only way to resolve this is to forcefully kill all NFS client
processes
and the restart the NFS client.



ugh. No, something else is wrong. I've been running such clusters for 
almost 20 years,

it isn't a problem with the NFS server code.




I've been running RSF-1 on OmniOS since about r151008.  All my 
clients have always been NFSv3 and NFSv4.


My memory is a bit fuzzy, but when I first started testing RSF-1, 
OmniOS still had the Sun lock manager which was later replaced with 
the BSD lock manager.   This has had many difficulties.


I do remember that fail overs when I first started with RSF-1 never 
had these stalls, I believe this was because the lock state was 
stored in the pool and the server taking over the pool would inherit 
that state too.   That state is now lost when a pool is imported with 
the BSD lock manager.


When I did testing I would do both full speed reading and writing to 
the pool and force fail overs, both by command line and by killing 
power on the active server.Never did I have a fail over take more 
than about 30 seconds for NFS to fully resume data flow.


Clients will back-off, but the client's algorithm is not universal, so 
we do expect to
see different client retry intervals for different clients. For 
example, the retries can
exceed 30 seconds for Solaris clients after a minute or two (alas, I 
don't have the
detailed data at my fingertips anymore :-(. Hence we work hard to make 
sure failovers

occur as fast as feasible.



Others who know more about the BSD lock manager vs the old Sun lock 
manager may be able to tell us more.  I'd also be curious if Nexenta 
has addressed this.


Lock manager itself is an issue and through we're currently testing 
the BSD lock

manager in anger, we haven't seen this behaviour.

Related to lock manager is name lookup. If you use name services, you 
add a latency
dependency to failover for name lookups, which is why we often disable 
DNS or other

network name services on high-availability services as a best practice.
 -- richard


This is, why I always put each host name,involved in my cluster setups, 
into /etc/hosts on each node.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Stephan Budach

Am 18.02.16 um 12:14 schrieb Michael Rasmussen:

On Thu, 18 Feb 2016 07:13:36 +0100
Stephan Budach <stephan.bud...@jvm.de> wrote:


So, when I issue a simple ls -l on the folder of the vdisks, while the 
switchover is happening, the command somtimes comcludes in 18 to 20 seconds, 
but sometime ls will just sit there for minutes.


This is a known limitation in NFS. NFS was never intended to be
clustered so what you experience is the NFS process on the client side
keeps kernel locks for the now unavailable NFS server and any request
to the process hangs waiting for these locks to be resolved. This can
be compared to a situation where you hot-swap a drive in the pool
without notifying the pool.

Only way to resolve this is to forcefully kill all NFS client processes
and the restart the NFS client.



This is not the main issue, as this is not a clustered NFS, it's a 
failover one. Of course the client will have to reset it's connection, 
but it seems that the NFS client is just doing that, after the NFS share 
becomes available on the failover host. Looking at the tcpdump, I found 
that failing over from the primary NFS server to the secondary, will 
work straight on. The service stalls for some more seconds than RSF-1 
needs to switch over the ZPOOL and the vip. In my tests it was always 
the switchback that caused these issue. Looking at the tcpdump, I 
noticed that when the switchback occured, the dump was swamped with DUP! 
acks. This indicated to me that the still running nfs server on the 
primary was still sending some outstanding ack to the now returned 
client, which vigorously denied them.
I think the outcome is, that the server finally gave up sending those 
ack(s) and then the connection resumed. So, what helped in this case was 
to restart the nfs server on the primary after the ZPOOL had been 
switched over to the secondary…


I have just tried to wait at least 5 minutes before failing back from 
the secondary to the primary node and this time it went as smoothly as 
it did, when I initially failed over from the primary to the secondary. 
However, I think for sanity the RSF-1 agent should also restart the nfs 
server on teh host, where it just moved the ZPOOL away from.


So, as fas as I am concerned, this issue is resolved. Thanks everybody 
for chiming in on this.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Stephan Budach

Am 18.02.16 um 09:29 schrieb Andrew Gabriel:

On 18/02/2016 06:13, Stephan Budach wrote:

Hi,

I have been test driving RSF-1 for the last week to accomplish the 
following:


- cluster a zpool, that is made up from 8 mirrored vdevs, which are 
based on 8 x 2 SSD mirrors via iSCSI from another OmniOS box

- export a nfs share from above zpool via a vip
- have RSF-1 provide the fail-over and vip-moving
- use the nfs share as a repository for my Oracle VM guests and vdisks

The setup seems to work fine, but I do have one issue, I can't seem 
to get solved. Whenever I failover the zpool, any inflight nfs data, 
will be stalled for some unpredictable time. Sometimes it takes not 
much longer than the "move" time of the resources but sometimes it 
takes up to 5 mins. until the nfs client on my VM server becomes 
alive again.


So, when I issue a simple ls -l on the folder of the vdisks, while 
the switchover is happening, the command somtimes comcludes in 18 to 
20 seconds, but sometime ls will just sit there for minutes.


I wonder, if there's anything, I could do about that. I have already 
played with several timeouts, nfs wise and tcp wise, but nothing seem 
to yield any effect on this issue. Anyone, who knows some tricks to 
speed up the inflight data?


I would capture a snoop trace on both sides of the cluster, and see 
what's happening. In this case, I would run snoop in non-promiscuous 
mode at least initially, to avoid picking up any frames which the IP 
stack is going to discard.
Yes, I will do that and see what traffic is happening, but I have a gutt 
feeling that this happens, while the vip has been taken down and the 
next transmission to that, currently non-existent, but present in the 
arp-cache IP, will stall somewhere.


Can you look at the ARP cache on client during the stall?
I will, but the arp cache must be updated quite soon, as the ping start 
working again after 15 to 20 seconds and they need a proper arp cache as 
well.


BTW, if you have 2 clustered heads both relying on another single 
system providing the iSCSI, that's a strange setup which may be giving 
you less availability (and less performance) than serving NFS directly 
from the SSD system without clustering.


This is actually not the way, it's going to be implemented. The missing 
storage node is already on it's way. Once the new head is in place, one 
of the iSCSI targets will be moved over to the new host, such as that 
all components are redundant.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Stephan Budach

Am 18.02.16 um 08:59 schrieb Dale Ghent:

Are you using NFS over TCP or UDP?

If using it over TCP, I would expect the TCP connection to get momentarily 
unhappy when its connection stalls and packets might need to be retransmitted 
after the floating IP's new MAC address is asserted. Have you tried UDP instead?

/dale



On Feb 18, 2016, at 1:13 AM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have been test driving RSF-1 for the last week to accomplish the following:

- cluster a zpool, that is made up from 8 mirrored vdevs, which are based on 8 
x 2 SSD mirrors via iSCSI from another OmniOS box
- export a nfs share from above zpool via a vip
- have RSF-1 provide the fail-over and vip-moving
- use the nfs share as a repository for my Oracle VM guests and vdisks

The setup seems to work fine, but I do have one issue, I can't seem to get solved. 
Whenever I failover the zpool, any inflight nfs data, will be stalled for some 
unpredictable time. Sometimes it takes not much longer than the "move" time of 
the resources but sometimes it takes up to 5 mins. until the nfs client on my VM server 
becomes alive again.

So, when I issue a simple ls -l on the folder of the vdisks, while the 
switchover is happening, the command somtimes comcludes in 18 to 20 seconds, 
but sometime ls will just sit there for minutes.

I wonder, if there's anything, I could do about that. I have already played 
with several timeouts, nfs wise and tcp wise, but nothing seem to yield any 
effect on this issue. Anyone, who knows some tricks to speed up the inflight 
data?

Thanks,
Stephan
Yes, NFS is using tcp for it's connection and naturally, this connection 
will hang as long as the connection is broken. However, when ping starts 
working again, all access to the NFS share as of that instance works. 
But… if I issue a ls on the NFS share, while the ping is not yet 
responding, the whole NFS connection hangs until it becomes working 
again and that seems can take a lot of time.


I will try UDP instead of tcp and see, if I can get better results with 
that.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-17 Thread Stephan Budach

Hi Michael,

Am 18.02.16 um 08:17 schrieb Michael Talbott:

While I don't have a setup like you've described, I'm going to take a wild 
guess and say check your switches (and servers) ARP tables. Perhaps the switch 
isn't updating your VIP address with the other servers MAC address fast enough. 
Maybe as part of the failover script, throw a command to your switch to update 
the ARP entry or clear its ARP table. Another perhaps simpler solution / 
diagnostic you could do is record a ping output of the server to your router 
via the vip interface and address right after the failover process to try and 
tickle the switch to update its mac table. Also it's possible the clients might 
need an ARP flush too.

If this is the case, another possibility is you could have both servers spoof 
the same MAC address and only ever have one up at a time and have them 
controlled by the failover script (or bad things will happen).

Just a thought.

Michael
Sent from my iPhone


On Feb 17, 2016, at 10:13 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:

Hi,

I have been test driving RSF-1 for the last week to accomplish the following:

- cluster a zpool, that is made up from 8 mirrored vdevs, which are based on 8 
x 2 SSD mirrors via iSCSI from another OmniOS box
- export a nfs share from above zpool via a vip
- have RSF-1 provide the fail-over and vip-moving
- use the nfs share as a repository for my Oracle VM guests and vdisks

The setup seems to work fine, but I do have one issue, I can't seem to get solved. 
Whenever I failover the zpool, any inflight nfs data, will be stalled for some 
unpredictable time. Sometimes it takes not much longer than the "move" time of 
the resources but sometimes it takes up to 5 mins. until the nfs client on my VM server 
becomes alive again.

So, when I issue a simple ls -l on the folder of the vdisks, while the 
switchover is happening, the command somtimes comcludes in 18 to 20 seconds, 
but sometime ls will just sit there for minutes.

I wonder, if there's anything, I could do about that. I have already played 
with several timeouts, nfs wise and tcp wise, but nothing seem to yield any 
effect on this issue. Anyone, who knows some tricks to speed up the inflight 
data?

Thanks,
Stephan


I don't think that the switches are the problem, since when I ping the 
vip from the VM host (OL6 based), then the ping only ceases for the time 
it takes RSF-1 to move the services and afterwards the pings continue 
just normally. The only thing I wonder is, if it's more of a NFS or a 
tcp-in-general thing. Maybe I should also test some other IP protocol to 
see, if that one stalls as well for that long of a time.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-17 Thread Stephan Budach

Hi,

I have been test driving RSF-1 for the last week to accomplish the 
following:


- cluster a zpool, that is made up from 8 mirrored vdevs, which are 
based on 8 x 2 SSD mirrors via iSCSI from another OmniOS box

- export a nfs share from above zpool via a vip
- have RSF-1 provide the fail-over and vip-moving
- use the nfs share as a repository for my Oracle VM guests and vdisks

The setup seems to work fine, but I do have one issue, I can't seem to 
get solved. Whenever I failover the zpool, any inflight nfs data, will 
be stalled for some unpredictable time. Sometimes it takes not much 
longer than the "move" time of the resources but sometimes it takes up 
to 5 mins. until the nfs client on my VM server becomes alive again.


So, when I issue a simple ls -l on the folder of the vdisks, while the 
switchover is happening, the command somtimes comcludes in 18 to 20 
seconds, but sometime ls will just sit there for minutes.


I wonder, if there's anything, I could do about that. I have already 
played with several timeouts, nfs wise and tcp wise, but nothing seem to 
yield any effect on this issue. Anyone, who knows some tricks to speed 
up the inflight data?


Thanks,
Stephan



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Ang: Re: Ang: zvol snapshot rollback issue, dataset busy

2016-02-17 Thread Stephan Budach

Am 16.02.16 um 18:58 schrieb Johan Kragsterman:

Hi!



-"OmniOS-discuss" <omnios-discuss-boun...@lists.omniti.com> skrev: -
Till: <omnios-discuss@lists.omniti.com>
Från: Stephan Budach
Sänt av: "OmniOS-discuss"
Datum: 2016-02-16 17:44
Ärende: Re: [OmniOS-discuss] Ang: zvol snapshot rollback issue, dataset busy

Am 16.02.16 um 17:11 schrieb Johan Kragsterman:

Hi!


-"OmniOS-discuss" <omnios-discuss-boun...@lists.omniti.com> skrev: -
Till: omnios-discuss <omnios-discuss@lists.omniti.com>
Från: Hafiz Rafiyev
Sänt av: "OmniOS-discuss"
Datum: 2016-02-16 16:47
Ärende: [OmniOS-discuss] zvol snapshot rollback issue,dataset busy

I now its OmniOs forum,question about Nexenta v4.0.4FP02.
Maybe issue is common to illumos,

Getting below error when trying to rollback zvol snapshot,zvol unshared before 
rollback operation,system rebooted but result was same.
   
any sugesttions?


zfs rollback -r OPOOL/DATA@snap-hourly-1-2016-02-16-130002

cannot rollback 'OPOOL/DATA': dataset is busy
   


Nexenta log:

Feb 16 15:35:47 xxx nms[966]: [ID 702911 local0.error] (:1.12) EXCEPTION: 
SystemCallError: 
SnapshotContainer::rollback(OPOOL/DATA@snap-hourly-1-2016-02-16-130002,-r):

[zfs rollback -r OPOOL/DATA@snap-hourly-1-2016-02-16-130002] cannot rollback 
'OPOOL/DATA': dataset is busy




My guess is that the volume you want to roll it back to is registred as an LU with 
stmf...is it? If so, you need to delete it first. But be aware, before you delete the LU 
you need to delete any view for that LU. Otherwise you get views "hanging in the 
air" which is difficult to get rid of.


Johan


I'd say, it's the other way round if you delete a LUN via stmfadm
delete-lu, that will also delete it's views, which can be annoying, if
you then want to re-import the LUN from the rolled back snapshot, no?

If you want to keep the view, which is what I usually want to do, just
run stmfadm delete-lu -k  to keep the view. That way, you can then
re-import the LUN on the rolled back dataset and won't have to re-create
the view for it.

Cheers,
Stephan



Is that so? I have been annoyed by this for a long time, that I needed to delete the view 
as well. But actually this was not the case some years ago when I discovered how I needed 
to do this. At this time I had views "hanging in the air" without connection to 
a LU, and difficulties to remove them.
Happy to see that this is working the way it should...

But I am not familiar with this -k option, where is that one documented? Can't 
find it on Illumos man page for stmf, nor can I find it on Oracle.


/Johan

I concurr, as much as I had found it also very cumbersome having to 
delete all views manually in the old days… I think to remember, that 
this was on OpenSolaris in 2011, when I started playing with COMSTAR and 
I was creating/removing LUNs and targets quite frequently.
However, on a production system, where everything has settled, I prefer 
it the other way round and I like to keep the vews in case I have to 
remove a LUN for the sake of a snapshot rollback.


Actually, the -k option is not mentioned anywhere but it is supported up 
to any version I checked. I came across this option when I searched the 
net for any means of deleting a LUN, but keeping the view and on some 
random website, there it was...


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


  1   2   >