Re: boot1.efifat's FAT12 volume label prevents booting (some systems)
Hi, all, > Am 06.11.2016 um 18:14 schrieb Dimitry Andric: > > Please do, so it is not forgotten. It is relatively easy to change the > volume label, by editing sys/boot/efi/boot1/generate-fat.sh, and then > regenerating the FAT templates. Why use the pre-generated image at all when you can easily create the EFI boot volume like this? gpart add -t efi -l efi -a 512k -s 512k newfs_msdos /dev/gpt/efi mount_msdosfs /dev/gpt/efi /mnt mkdir -p /mnt/efi/boot cp /boot/boot1.efi /mnt/efi/boot/bootx64.efi umount /mnt Kind regards, Patrick -- punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe Tel. 0721 9109 0 * Fax 0721 9109 100 i...@punkt.de http://www.punkt.de Gf: Jürgen Egeling AG Mannheim 108285 signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Use of env SRC_ENV_CONF=. . . for buildworld does not override/avoid use of /etc/src.conf : Intentional?
[The original of this message was not delivered to two of the places it was sent to. This retries sending to just those places.] On 2016-Nov-4, at 9:40 AM, Bryan Drewery wrote: > On 11/3/2016 5:28 PM, Mark Millard wrote: >> I just had a case of "odd" command text in a buildworld that was based on >> (in part) env SRC_ENV_CONF=. . . >> >> env __MAKE_CONF=. . . does not get the kind of behavior reported below for >> /etc/src.conf . >> >> Overall this means that even with an explicit env SRC_ENV_CONF=. . . one >> must separately prevent /etc/src.conf from contributing if the SRC_ENV_CONF >> file is intended to cover everything. > > SRC_ENV_CONF is kind of a special hack to allow setting some specific > values that feasibly can't be set later. Just stick to src.conf unless > you need to set one of the options that requires src-env.conf. > > -- > Regards, > Bryan Drewery Understood (now): intentional for sure. Thanks to Renato Botelho and you for making clear that I'd read something into the description that just was not written into the description. For now I've adopted using an explicit env SRCCONF="/dev/null" in the scripts as the means of avoiding an unexpected contribution and I still have env SRC_ENV_CONF= use for picking out the file: I then do not have to worry about if I reference any of the special values in the file referenced or not, nor about what /etc/src-env.conf or /etc/src.conf might have in them. I may change this at some point and follow your suggestion to just use SRCCONF= to find the file because as time goes on it looks more like I'm unlikely to experiment with any "special values" in the files. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
Am 06.11.2016 um 22:27 schrieb Baptiste Daroussin: > >> But under what circumstances would [A-Z] mean anything other than a >> character whose Unicode codepoint is between U+0041 and U+005A, inclusive? >> Especially given the locale in the example is en_US.UTF-8. Or, put another >> way, why would an implementation interpret [A-Z] as anything other than >> [ABCDE…XYZ]? > > The collation rules for unicode comes from: http://cldr.unicode.org/ and they > do > match the one on linux for example and the one on illumos. > > On some gnu tool they explicitly decide to be non locale aware to avoid that > kind of "surprises" >> >> From reading your reference, I can see in 9.3.5.7: >>> In the POSIX locale, a range expression represents the set of collating >>> elements that fall between two elements in the collation sequence, >>> inclusive. In other locales, a range expression has unspecified behavior[…] >> >> So even if the observed behaviour is conforming, I’d think it’s still highly >> undesirable. >> > That works for POSIX locale aka C aka ASCII only world So what do I set my LANG and LC variables to? I do want UTF-8, but I do also want my scripts to continue to work. Clearly, en_US.UTF-8 is not what I want. Is it C.UTF-8? Or do I set LANG=en_US.UTF-8 and LC_COLLATE=C? Stefan -- Stefan Bethke Fon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On 06/11/2016 23:30, Stefan Bethke wrote: > Although with en_US.UTF-8 on other systems, I have not had that experience. > A quick check on stuff I have immediate access to: > > macOS 10.12: > $ echo 'abcdABCD' | sed 's/[A-Z]/X/g’ > abcd > > Ubuntu 14.04.5 > $ echo 'abcdABCD' | sed 's/[A-Z]/X/g’ > abcd > > FreeBSD 10-stable > $ echo 'abcdABCD' | sed 's/[A-Z]/X/g' > abcd Latest Gentoo: $ echo 'abcdABCD' | sed 's/[A-Z]/X/g' aXXX Recent OpenIndiana (an illumos based OS): $ echo 'abcdABCD' | sed 's/[A-Z]/X/g' aXXX -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
> Am 06.11.2016 um 22:14 schrieb Stefan Ehmann: > >> That is rather surprising. Is there a normative reference for the >> treatment of bracket expressions and character classes when using >> locales other than C and/or encodings like UTF-8? > > I found an interesting article about this issue in gawk: > https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html OK, I give up. Back to jwz: "now you have two problems.“ Although with en_US.UTF-8 on other systems, I have not had that experience. A quick check on stuff I have immediate access to: macOS 10.12: $ echo 'abcdABCD' | sed 's/[A-Z]/X/g’ abcd Ubuntu 14.04.5 $ echo 'abcdABCD' | sed 's/[A-Z]/X/g’ abcd FreeBSD 10-stable $ echo 'abcdABCD' | sed 's/[A-Z]/X/g' abcd Stefan -- Stefan Bethke Fon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On Sun, Nov 06, 2016 at 10:20:54PM +0100, Stefan Bethke wrote: > > > Am 06.11.2016 um 22:06 schrieb Baptiste Daroussin: > > > > On Sun, Nov 06, 2016 at 09:57:00PM +0100, Stefan Bethke wrote: > >> > >>> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin : > >>> > >>> On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > I happened to run an old script today that uses sed(1) to extract the > system > boot time from the kern.boottime sysctl MIB. On 11.0 this no longer > works as > expected: > > $ sysctl kern.boottime > kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 > 2016 > $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > v 5 16:18:34 2016 > > sed passes over 'S' and 'N' until it hits 'v', which it considers > uppercase > apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as > expected: > > $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' > Nov 5 16:18:34 2016 > > Testing every lowercase character separately gives even more inconsistent > results: > > $ cat < >> > Here sed thinks every lowercase character except for 'a' is uppercase! > This > differs from the first test where sed did not think 'o' is uppercase. > Again, > the above behaves as expected with LANG=C. > > Does anyone have any insight into this? This is likely to break a lot of > existing code. > > >>> > >>> Yes A-Z only means uppercase in an ASCII only world in a unicode world it > >>> means > >>> AaBb... Z because there are way more characters that simple A-Z. In > >>> FreeBSD 11 > >>> we have a unicode collation instead of falling back in on LC_COLLATE=C > >>> which > >>> means ascii only > >>> > >>> For regrexp for example one should use the classes: :upper: or :lower:. > >> > >> That is rather surprising. Is there a normative reference for the > >> treatment of bracket expressions and character classes when using locales > >> other than C and/or encodings like UTF-8? > > > > http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html > > > > For example: > > > > "Regular expressions are a context-independent syntax that can represent a > > wide > > variety of character sets and character set orderings, where these character > > sets are interpreted according to the current locale. While many regular > > expressions can be interpreted differently depending on the current locale, > > many > > features, such as character class expressions, provide for contextual > > invariance > > across locales.“ > > Sorry, maybe I wasn’t clear enough with my question. When a character class > fits the problem, it is clearly advantageous. > > But under what circumstances would [A-Z] mean anything other than a character > whose Unicode codepoint is between U+0041 and U+005A, inclusive? Especially > given the locale in the example is en_US.UTF-8. Or, put another way, why > would an implementation interpret [A-Z] as anything other than [ABCDE…XYZ]? The collation rules for unicode comes from: http://cldr.unicode.org/ and they do match the one on linux for example and the one on illumos. On some gnu tool they explicitly decide to be non locale aware to avoid that kind of "surprises" > > From reading your reference, I can see in 9.3.5.7: > > In the POSIX locale, a range expression represents the set of collating > > elements that fall between two elements in the collation sequence, > > inclusive. In other locales, a range expression has unspecified behavior[…] > > So even if the observed behaviour is conforming, I’d think it’s still highly > undesirable. > That works for POSIX locale aka C aka ASCII only world Best regards, Bapt signature.asc Description: PGP signature
Re: Uppercase RE matching problems in FreeBSD 11
> Am 06.11.2016 um 22:06 schrieb Baptiste Daroussin: > > On Sun, Nov 06, 2016 at 09:57:00PM +0100, Stefan Bethke wrote: >> >>> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin : >>> >>> On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: I happened to run an old script today that uses sed(1) to extract the system boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as expected: $ sysctl kern.boottime kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' v 5 16:18:34 2016 sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as expected: $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' Nov 5 16:18:34 2016 Testing every lowercase character separately gives even more inconsistent results: $ cat <> Here sed thinks every lowercase character except for 'a' is uppercase! This differs from the first test where sed did not think 'o' is uppercase. Again, the above behaves as expected with LANG=C. Does anyone have any insight into this? This is likely to break a lot of existing code. >>> >>> Yes A-Z only means uppercase in an ASCII only world in a unicode world it >>> means >>> AaBb... Z because there are way more characters that simple A-Z. In FreeBSD >>> 11 >>> we have a unicode collation instead of falling back in on LC_COLLATE=C which >>> means ascii only >>> >>> For regrexp for example one should use the classes: :upper: or :lower:. >> >> That is rather surprising. Is there a normative reference for the treatment >> of bracket expressions and character classes when using locales other than C >> and/or encodings like UTF-8? > > http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html > > For example: > > "Regular expressions are a context-independent syntax that can represent a > wide > variety of character sets and character set orderings, where these character > sets are interpreted according to the current locale. While many regular > expressions can be interpreted differently depending on the current locale, > many > features, such as character class expressions, provide for contextual > invariance > across locales.“ Sorry, maybe I wasn’t clear enough with my question. When a character class fits the problem, it is clearly advantageous. But under what circumstances would [A-Z] mean anything other than a character whose Unicode codepoint is between U+0041 and U+005A, inclusive? Especially given the locale in the example is en_US.UTF-8. Or, put another way, why would an implementation interpret [A-Z] as anything other than [ABCDE…XYZ]? From reading your reference, I can see in 9.3.5.7: > In the POSIX locale, a range expression represents the set of collating > elements that fall between two elements in the collation sequence, inclusive. > In other locales, a range expression has unspecified behavior[…] So even if the observed behaviour is conforming, I’d think it’s still highly undesirable. Stefan -- Stefan Bethke Fon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On 06.11.2016 21:57, Stefan Bethke wrote: > >> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin >>: >> >> On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: >>> I happened to run an old script today that uses sed(1) to extract >>> the system boot time from the kern.boottime sysctl MIB. On 11.0 >>> this no longer works as expected: .. >>> Here sed thinks every lowercase character except for 'a' is >>> uppercase! This differs from the first test where sed did not >>> think 'o' is uppercase. Again, the above behaves as expected with >>> LANG=C. >>> >>> Does anyone have any insight into this? This is likely to break a >>> lot of existing code. >>> >> >> Yes A-Z only means uppercase in an ASCII only world in a unicode >> world it means AaBb... Z because there are way more characters that >> simple A-Z. In FreeBSD 11 we have a unicode collation instead of >> falling back in on LC_COLLATE=C which means ascii only >> >> For regrexp for example one should use the classes: :upper: or >> :lower:. > > That is rather surprising. Is there a normative reference for the > treatment of bracket expressions and character classes when using > locales other than C and/or encodings like UTF-8? I found an interesting article about this issue in gawk: https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html Apparently the meaning of ranges is unspecified outside the "C" locale. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 says: "In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely on whether the range expression is valid, or on the set of collating elements matched" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On Sun, Nov 06, 2016 at 09:57:00PM +0100, Stefan Bethke wrote: > > > Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin: > > > > On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > >> I happened to run an old script today that uses sed(1) to extract the > >> system > >> boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works > >> as > >> expected: > >> > >> $ sysctl kern.boottime > >> kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 > >> $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > >> v 5 16:18:34 2016 > >> > >> sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase > >> apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as > >> expected: > >> > >> $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' > >> Nov 5 16:18:34 2016 > >> > >> Testing every lowercase character separately gives even more inconsistent > >> results: > >> > >> $ cat < > >> Here sed thinks every lowercase character except for 'a' is uppercase! This > >> differs from the first test where sed did not think 'o' is uppercase. > >> Again, > >> the above behaves as expected with LANG=C. > >> > >> Does anyone have any insight into this? This is likely to break a lot of > >> existing code. > >> > > > > Yes A-Z only means uppercase in an ASCII only world in a unicode world it > > means > > AaBb... Z because there are way more characters that simple A-Z. In FreeBSD > > 11 > > we have a unicode collation instead of falling back in on LC_COLLATE=C which > > means ascii only > > > > For regrexp for example one should use the classes: :upper: or :lower:. > > That is rather surprising. Is there a normative reference for the treatment > of bracket expressions and character classes when using locales other than C > and/or encodings like UTF-8? http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html For example: "Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual invariance across locales." Best regards, Bapt signature.asc Description: PGP signature
Re: Uppercase RE matching problems in FreeBSD 11
> Am 06.11.2016 um 12:07 schrieb Baptiste Daroussin: > > On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: >> I happened to run an old script today that uses sed(1) to extract the system >> boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as >> expected: >> >> $ sysctl kern.boottime >> kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 >> $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' >> v 5 16:18:34 2016 >> >> sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase >> apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as >> expected: >> >> $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' >> Nov 5 16:18:34 2016 >> >> Testing every lowercase character separately gives even more inconsistent >> results: >> >> $ cat <> Here sed thinks every lowercase character except for 'a' is uppercase! This >> differs from the first test where sed did not think 'o' is uppercase. Again, >> the above behaves as expected with LANG=C. >> >> Does anyone have any insight into this? This is likely to break a lot of >> existing code. >> > > Yes A-Z only means uppercase in an ASCII only world in a unicode world it > means > AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 > we have a unicode collation instead of falling back in on LC_COLLATE=C which > means ascii only > > For regrexp for example one should use the classes: :upper: or :lower:. That is rather surprising. Is there a normative reference for the treatment of bracket expressions and character classes when using locales other than C and/or encodings like UTF-8? Stefan -- Stefan Bethke Fon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: boot1.efifat's FAT12 volume label prevents booting (some systems)
On 06 Nov 2016, at 16:07, Harry Schmalzbauerwrote: > > Recently I played with bsdinstall and UEFI setup, which left the system > unbootable (11.0-Release). > The culprit is the MS-DOS volume lable "EFI" of the EFI partition. > At least on Intel Single-Socket Servers (for Xeon E3 IvyBridge/BearToot > + Haswell/RainbowPass), the UEFI firmware can't handle the identical > path/volumelabel. That is pretty weird. I wasn't aware that any firmware even used this label for anything? Maybe they mount it under a directory named after the label, or something. > Simply reformatting with a different volume label (EFIFAT e.g.) solves > that problem! > Shall I file a bug report? Please do, so it is not forgotten. It is relatively easy to change the volume label, by editing sys/boot/efi/boot1/generate-fat.sh, and then regenerating the FAT templates. > Btw, can someone explain in short words why BOOT64.EFI seems to be > boot1.efi, but padded with 0x20 up to 128k? At buildworld time, pre-populated FAT file system templates are used, instead of playing games with mounting ramdisks and creating file systems in them. The build process just inserts the contents of boot1.efi into a fixed location into the existing FAT template. And the template is pre-propulated with a 128kiB bootx64.efi file. -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail
how to download freebsd
feryputrasulun...@gmail.com ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
boot1.efifat's FAT12 volume label prevents booting (some systems)
Recently I played with bsdinstall and UEFI setup, which left the system unbootable (11.0-Release). The culprit is the MS-DOS volume lable "EFI" of the EFI partition. At least on Intel Single-Socket Servers (for Xeon E3 IvyBridge/BearToot + Haswell/RainbowPass), the UEFI firmware can't handle the identical path/volumelabel. Simply reformatting with a different volume label (EFIFAT e.g.) solves that problem! Shall I file a bug report? Btw, can someone explain in short words why BOOT64.EFI seems to be boot1.efi, but padded with 0x20 up to 128k? Thanks, -Harry ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On Sun, Nov 06, 2016 at 01:26:51PM +0100, Mark Martinec wrote: > 2016-11-06 12:07, Baptiste Daroussin wrote: > > Yes A-Z only means uppercase in an ASCII only world in a unicode world > > it means > > AaBb... Z because there are way more characters that simple A-Z. In > > FreeBSD 11 > > we have a unicode collation instead of falling back in on LC_COLLATE=C > > which > > means ascii only > > > > For regrexp for example one should use the classes: :upper: or :lower:. > > It is a good idea to keep LC_COLLATE and LC_NUMERIC (and LC_MONETARY?) at > "C" > when LANG or LC_CTYPE is set to something else, otherwise unexpected > things may happen. > In scripts clearly, the collation rules, numeric rules and monetary rules may vary depending on the locale. Best regards, Bapt signature.asc Description: PGP signature
Re: Uppercase RE matching problems in FreeBSD 11
2016-11-06 12:07, Baptiste Daroussin wrote: Yes A-Z only means uppercase in an ASCII only world in a unicode world it means AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 we have a unicode collation instead of falling back in on LC_COLLATE=C which means ascii only For regrexp for example one should use the classes: :upper: or :lower:. It is a good idea to keep LC_COLLATE and LC_NUMERIC (and LC_MONETARY?) at "C" when LANG or LC_CTYPE is set to something else, otherwise unexpected things may happen. Mark On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: I happened to run an old script today that uses sed(1) to extract the system boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as expected: $ sysctl kern.boottime kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' v 5 16:18:34 2016 sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as expected: $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' Nov 5 16:18:34 2016 Testing every lowercase character separately gives even more inconsistent results: $ cat < a > b > c > d > e > f > g > h > i > j > k > l > m > n > o > p > q > r > s > t > u > v > w > x > y > z > ! b c d e f g h i j k l m n o p q r s t u v w x y z Here sed thinks every lowercase character except for 'a' is uppercase! This differs from the first test where sed did not think 'o' is uppercase. Again, the above behaves as expected with LANG=C. Does anyone have any insight into this? This is likely to break a lot of existing code. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Uppercase RE matching problems in FreeBSD 11
On Sat, Nov 05, 2016 at 08:23:25PM -0500, Greg Rivers wrote: > I happened to run an old script today that uses sed(1) to extract the system > boot time from the kern.boottime sysctl MIB. On 11.0 this no longer works as > expected: > > $ sysctl kern.boottime > kern.boottime: { sec = 1478380714, usec = 145351 } Sat Nov 5 16:18:34 2016 > $ sysctl kern.boottime | sed -e 's/.*\([A-Z].*\)$/\1/' > v 5 16:18:34 2016 > > sed passes over 'S' and 'N' until it hits 'v', which it considers uppercase > apparently. This is with LANG=en_US.UTF-8. If I set LANG=C, it works as > expected: > > $ sysctl kern.boottime | LANG=C sed -e 's/.*\([A-Z].*\)$/\1/' > Nov 5 16:18:34 2016 > > Testing every lowercase character separately gives even more inconsistent > results: > > $ cat < > a > > b > > c > > d > > e > > f > > g > > h > > i > > j > > k > > l > > m > > n > > o > > p > > q > > r > > s > > t > > u > > v > > w > > x > > y > > z > > ! > b > c > d > e > f > g > h > i > j > k > l > m > n > o > p > q > r > s > t > u > v > w > x > y > z > > Here sed thinks every lowercase character except for 'a' is uppercase! This > differs from the first test where sed did not think 'o' is uppercase. Again, > the above behaves as expected with LANG=C. > > Does anyone have any insight into this? This is likely to break a lot of > existing code. > Yes A-Z only means uppercase in an ASCII only world in a unicode world it means AaBb... Z because there are way more characters that simple A-Z. In FreeBSD 11 we have a unicode collation instead of falling back in on LC_COLLATE=C which means ascii only For regrexp for example one should use the classes: :upper: or :lower:. Best regards, Bapt signature.asc Description: PGP signature