Re: [racket-users] Working out which directory the current code was loaded from?

2020-07-27 Thread Peter W A Wood
Many thanks Philip

The resulting code is much neater and, to me, more readable:

(define-runtime-path foo.txt "foo.txt")
(define-runtime-path bar.txt "bar.txt")
(define-runtime-path outfile.txt "outfile.txt")
(define-runtime-path Data/portfolio.csv "../../Data/portfolio.csv")
(define-runtime-path Data/portfolio.csv.gz "../../Data/portfolio.csv.gz”)

Peter

> On 28 Jul 2020, at 09:40, Philip McGrath  wrote:
> 
> For this particular purpose, you want `define-runtime-path`: 
> https://docs.racket-lang.org/reference/Filesystem.html#%28part._runtime-path%29
>  
> <https://docs.racket-lang.org/reference/Filesystem.html#%28part._runtime-path%29>
> 
> -Philip
> 
> 
> On Mon, Jul 27, 2020 at 9:38 PM Peter W A Wood  <mailto:peterwaw...@gmail.com>> wrote:
> I have a short racket program/script that reads a file from the directory in 
> which it is stored. The directory structure is something like this:
> 
> a/
> b/
> c/
> my-racket.rkt 
> my-data-file.txt
> 
> I want to be able to run the program from the command line no matter what is 
> the current working directory. E.G.:
> 
> a> racket b/c/my-racket.rkt
> a/b> racket c/my-racket.rkt
> a/b/c> racket my-racket.rkt
> 
> In order to do so, I need to provide the correct path to my-data-file.txt 
> depending on from where the script was launched. I haven’t learnt about 
> Racket modules yet so I resorted to searching Stack Overflow. I found a code 
> snippet that I used which worked:
> 
> (define script-dir (path-only (resolved-module-path-name
>   (variable-reference->resolved-module-path
>(#%variable-reference)
> 
> Is this the best way to ascertain the directory of the “current module”?
> 
> Thanks in advance
> 
> Peter 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com 
> <mailto:racket-users%2bunsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/92DACE01-60C8-445A-A07E-A4E6A6F5F684%40gmail.com
>  
> <https://groups.google.com/d/msgid/racket-users/92DACE01-60C8-445A-A07E-A4E6A6F5F684%40gmail.com>.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/524F478B-0915-4C2F-874B-84CDB966B7D5%40gmail.com.


[racket-users] Working out which directory the current code was loaded from?

2020-07-27 Thread Peter W A Wood
I have a short racket program/script that reads a file from the directory in 
which it is stored. The directory structure is something like this:

a/
b/
c/
my-racket.rkt 
my-data-file.txt

I want to be able to run the program from the command line no matter what is 
the current working directory. E.G.:

a> racket b/c/my-racket.rkt
a/b> racket c/my-racket.rkt
a/b/c> racket my-racket.rkt

In order to do so, I need to provide the correct path to my-data-file.txt 
depending on from where the script was launched. I haven’t learnt about Racket 
modules yet so I resorted to searching Stack Overflow. I found a code snippet 
that I used which worked:

(define script-dir (path-only (resolved-module-path-name
  (variable-reference->resolved-module-path
   (#%variable-reference)

Is this the best way to ascertain the directory of the “current module”?

Thanks in advance

Peter 


-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/92DACE01-60C8-445A-A07E-A4E6A6F5F684%40gmail.com.


Re: [racket-users] Are Regular Expression classes Unicode aware?

2020-07-11 Thread Peter W A Wood
Dear Ryan

Thank you for both your full, complete and understandable explanation and a 
working solution which is more than sufficient for my needs.

I created a very simple function based on the reg=exp that you suggested and 
tested it against a number of cases:


#lang racket
(require test-engine/racket-tests)

(check-expect (alpha? "") #f)   ; empty 
string
(check-expect (alpha? "1") #f)   
(check-expect (alpha? "a") #t)
(check-expect (alpha? "hello") #t)
(check-expect (alpha? "h1llo") #f)
(check-expect (alpha? "\u00E7c\u0327") #t)   ; çç
(check-expect (alpha? "noe\u0308l") #t) ; noél
(check-expect (alpha? "\U01D122") #f)   ; 턢 (bass clef)
(check-expect (alpha? "\u216B") #f)   ; Ⅻ (roman 
numeral)
(check-expect (alpha? "\u0BEB") #f)   ; ௫ (5 in Tamil)
(check-expect (alpha? "二の句") #t); Japanese word 
"ninoku"
(check-expect (alpha? "مدينة") #t); Arabic word 
"madina"
(check-expect (alpha? "٥") #f) ; Arabic 
number 5
(check-expect (alpha? "\u0628\uFCF2") #t); Arabic letter beh 
with shaddah
(define (alpha? s)
 (regexp-match? #px"^\\p{L}+$" (string-normalize-nfc s)))
(test)

I suspect that there are some cases with scripts requiring multiple code points 
to render a single character such as Arabic with pronunciation marks e.g. دُ 
نْيَا. At the moment, I don’t have the time (or need) to investigate further.  

The depth of Racket’s Unicode support is impressive.

Once again, thanks.

Peter


> On 10 Jul 2020, at 15:47, Ryan Culpepper  wrote:
> 
> (I see this went off the mailing list. If you reply, please consider CCing 
> the list.)
> 
> Yes, I understood your goal of trying to capture the notion of Unicode 
> "alphabetic" characters with a regular expression.
> 
> As far as I know, Unicode doesn't have a notion of "alphabetic", but it does 
> assign every code point to a "General category", consisting of a main 
> category and a subcategory. There is a category called "Letter", which seems 
> like one reasonable generalization of "alphabetic".
> 
> In Racket, you can get the code for a character's category using 
> `char-general-category`. For example:
> 
>   > (char-general-category #\A)
>   'lu
>   > (char-general-category #\é)
>   'll
>   > (char-general-category #\ß)
>   'll
>   > (char-general-category #\7)
>   'nd
> 
> The general category for "A" is "Letter, uppercase", which has the code "Lu", 
> which Racket turns into the symbol 'lu. The general category of "é" is 
> "Letter, lowercase", code "Ll", which becomes 'll. The general category of 
> "7" is "Number, decimal digit", code "Nd".
> 
> In Racket regular expressions, the \p{category} syntax lets you recognize 
> characters from a specific category. For example, \p{Lu} recognizes 
> characters with the category "Letter, uppercase", and \p{L} recognizes 
> characters with the category "Letter", which is the union of "Letter, 
> uppercase", "Letter, lowercase", and so on.
> 
> So the regular expression #px"^\\p{L}+$" recognizes sequences of one or more 
> Unicode letters. For example:
> 
>   > (regexp-match? #px"^\\p{L}+$" "héllo")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "straße")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "二の句")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "abc123")
>   #f ;; No, contains numbers
> 
> There are still some problems to watch out for, though. For example, accented 
> characters like "é" can be expressed as a single pre-composed code point or 
> "decomposed" into a base letter and a combining mark. You can get the 
> decomposed form by converting the string to "decomposed normal form" (NFD), 
> and the regexp above won't match that string.
> 
>   > (map char-general-category (string->list (string-normalize-nfd "é")))
>   '(ll mn)
>   > (regexp-match? #px"^\\p{L}+$" (string-normalize-nfd "héllo"))
>   #f
> 
> One fix would be to call `string-normalize-nfc` first, but some 
> letter-modifier pairs don't have pre-composed versions. Another fix would be 
> to expand the regexp to include modifiers. You'd have to decide which is 
> better based on your application.
> 
> Ryan
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/09B244A4-89C5-4B5C-97E7-5487059125F6%40gmail.com.


Re: [racket-users] Are Regular Expression classes Unicode aware?

2020-07-10 Thread Peter W A Wood
Dear Ryan

Thank you very much for the kind, detailed explanation which I will study 
carefully. It was not my intention to reply to you off-list. I hope I have 
correctly addressed this reply to appear on-list.

Peter

> On 10 Jul 2020, at 15:47, Ryan Culpepper  wrote:
> 
> (I see this went off the mailing list. If you reply, please consider CCing 
> the list.)
> 
> Yes, I understood your goal of trying to capture the notion of Unicode 
> "alphabetic" characters with a regular expression.
> 
> As far as I know, Unicode doesn't have a notion of "alphabetic", but it does 
> assign every code point to a "General category", consisting of a main 
> category and a subcategory. There is a category called "Letter", which seems 
> like one reasonable generalization of "alphabetic".
> 
> In Racket, you can get the code for a character's category using 
> `char-general-category`. For example:
> 
>   > (char-general-category #\A)
>   'lu
>   > (char-general-category #\é)
>   'll
>   > (char-general-category #\ß)
>   'll
>   > (char-general-category #\7)
>   'nd
> 
> The general category for "A" is "Letter, uppercase", which has the code "Lu", 
> which Racket turns into the symbol 'lu. The general category of "é" is 
> "Letter, lowercase", code "Ll", which becomes 'll. The general category of 
> "7" is "Number, decimal digit", code "Nd".
> 
> In Racket regular expressions, the \p{category} syntax lets you recognize 
> characters from a specific category. For example, \p{Lu} recognizes 
> characters with the category "Letter, uppercase", and \p{L} recognizes 
> characters with the category "Letter", which is the union of "Letter, 
> uppercase", "Letter, lowercase", and so on.
> 
> So the regular expression #px"^\\p{L}+$" recognizes sequences of one or more 
> Unicode letters. For example:
> 
>   > (regexp-match? #px"^\\p{L}+$" "héllo")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "straße")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "二の句")
>   #t
>   > (regexp-match? #px"^\\p{L}+$" "abc123")
>   #f ;; No, contains numbers
> 
> There are still some problems to watch out for, though. For example, accented 
> characters like "é" can be expressed as a single pre-composed code point or 
> "decomposed" into a base letter and a combining mark. You can get the 
> decomposed form by converting the string to "decomposed normal form" (NFD), 
> and the regexp above won't match that string.
> 
>   > (map char-general-category (string->list (string-normalize-nfd "é")))
>   '(ll mn)
>   > (regexp-match? #px"^\\p{L}+$" (string-normalize-nfd "héllo"))
>   #f
> 
> One fix would be to call `string-normalize-nfc` first, but some 
> letter-modifier pairs don't have pre-composed versions. Another fix would be 
> to expand the regexp to include modifiers. You'd have to decide which is 
> better based on your application.
> 
> Ryan
> 
> 
> 
> On Fri, Jul 10, 2020 at 2:10 AM Peter W A Wood  wrote:
> Ryan
> 
> > On 9 Jul 2020, at 22:52, Ryan Culpepper  wrote:
> > 
> > If you want a regular expression that does match the example string, you 
> > can use the \p{property} notation. For example:
> > 
> >   > (regexp-match? #px"^\\p{L}+$" "h\uFFC3\uFFA9llo")
> >   #t
> > 
> > The "Regexp Syntax" docs have a grammar for regular expressions with links 
> > to examples.
> > 
> > Ryan
> 
> Thanks. I used héllo as an example. I was wondering if there was a way of 
> specifying a regular expression group for Unicode “alphabetic” characters. 
> 
> On reflection, it seems a somewhat esoteric requirement that is almost 
> impossible to satisfy. By way of example, would 
> “Straße" be considered alphabetic? Would “二の句” be considered alphabetic?
> 
> Strangely, Python considered the Japanese characters as being alphabetic but 
> will not accept “Straße” as a valid string. (I suspect this is due to some 
> problem relating to Locale..
> 
>  >>> "二の句".isalpha()
> True
> >>> “Straße".isalpha()
>   File "", line 1
> “Straße".isalpha()
>   ^
> SyntaxError: invalid character in identifier
> 
> Clearly, trying to identify “Unicode” alphabetic characters is far from 
> straightforward, though it may well be useful in processing some language 
> texts.
> 
> Peter
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/BC855B5D-80BF-458B-A2D2-9570B0436646%40gmail.com.


[racket-users] Are Regular Expression classes Unicode aware?

2020-07-09 Thread Peter W A Wood
I was experimenting with regular expressions to try to emulate the Python 
isalpha() String method. Using a simple [a-zA-Z] character class worked for the 
English alphabet (ASCII characters):

> (regexp-match? #px"^[a-zA-Z]+$" "hello")
#t
> (regexp-match? #px"^[a-zA-Z]+$" "h1llo")
#f 

It then dawned on me that the Python is alpha() method was Unicode aware:

>>> "é".isalpha()
True

I started scratching my head as how to achieve the equivalent using a regular 
expression in Python. I tried the same regular expression with a non-English 
character in the string. To my surprise, the regular expression recognised the 
non-ASCII characters.

> (regexp-match? #px"^[a-zA-Z]+$" "h\U+FFC3\U+FFA9llo")
#t

Are Racket regular expression character classes Unicode aware or is there some 
other explanation why this regular expression matches?

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/2197C34F-165D-4D97-97AD-F158153316F5%40gmail.com.