Re: Question about lexing

I probably can't help much on the C++ specifics.  I avoid C++ unicode handling like the plague, so much so that I would literally choose another language over C++ if it was the primary part of the project I'm working on.  They made a lot of fundamentally bad choices there, and are unique as compared to literally anyone else doing unicode strings which just says "ok everyone, have some utf8".  So if it's a bug there, you're on your own.  Yes, I get there's probably some justifications, they probably even make some sense, but that doesn't make it any less horrible.  Mind you to be honest I avoid all the standard streams stuff and just use printf as well.  In general it's something C++ badly botched.  But I digress.  Though see my note at the bottom.

Your other C++ point: std::variant is efficient, or should be assumed to be.  It's a tagged union.  The equivalent you might right by hand won't be smaller, unless you're trying to avoid the tagged part of that and know which type it is somehow else.

As for the regexes, you need to turn your loop around.  Anchor the regex to the beginning of the string, then (as pseudocode), it's:

while text remaining:
    if match identifier:
        chop identifier
    if match number:
        chop number

And so on.  You can modify the regex to anchor to the beginning of the string by prepending ^, or there might possibly be something in C++ for it.  I'm not sure what your professor is thinking with respect to giving you regexes for integer and float: I suggest that you always match float, and then determine if it's an integer after the fact by examining the string.

But, two things:

First, did you check with the professor to make sure that you're allowed to do it this way?  The integer vs float thing is a classical lookahead sort of thing that a regex can't easily solve but which writing this the giant finite state machine of doom way can.

Second, you might as well drop anything at all to do with unicode.  In addition to the points about C++ generally botching it, unicode regex is sort of not a thing.  It's technically possible in theory, but in practice I'm pretty sure that std::regex isn't.  We're actually trying to do that at work right now because reasons I think aren't public yet, and people who make me look like a junior consider it a massive problem, a potentially entirely unsolvable one.  When you've got the guy on the team who is hand-writing magic utf8 encoders and decoders for the purposes of being able to store both unicode data and arbitrary binary blobs in the same on-disk format saying this isn't something we can solve in reasonable effort...well, that kind of speaks for itself.  There's additional work complexities there, it is technically doable in the general case, but in general I'd just not bother for college assignments unless it's been mandated because even in the general case it's pretty involved.



-- 
Audiogames-reflector mailing list
Audiogames-reflector@sabahattin-gucukoglu.com
https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector
  • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : camlorn via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : camlorn via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Nuno via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : camlorn via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : stewie via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : stewie via Audiogames-reflector
    • ... AudioGames . net Forum — Developers room : Ethin via Audiogames-reflector

Reply via email to