Now, just for giggles here's the regex. It's for a simple SQL like Where
clause query language. Hopefully it makes sense :) According to my work
colleagues I am considered weird for enjoying writing a regex and they're
certainly more fun than waiting for the election result ;) (This was for a
personal project)

Cheers,

Mike

// setup the regex for tokenising the query.
//
// i.   \G means match at the start of where the previous match finished
//
// ii.  We used named groups (?<....> with names that match the TokenTypes
enum so we can tell which command has been matched
//      (except for Whitespace which when we find it we move onto the next
token)
//
// iii. The token this returns removes any separators e.g. [] for fields,
"" for string, ## for datetime. (also in a string "" indicates a double
quote)
//
// iv.  The keywords below are in descending size order so that tokens such
as NotIn don't get confused with Not or In etc. (EndOfString is the shortest
//      as it is the end of string so it's empty).
//
// v.   As well as supporting the operators <, >, >=, <=, =, <> we also
support the alternatives GT, LT, GE, LE, EQ, NE, == and !=
string pattern = $@"\G(
                (?<EndOfString>$)|
                (?<WhiteSpace>\s+)|
                (\#(?<DateTime>[^\#]*)\#)|
                (\[(?<Field>[^\]]*)\])|

(?<Numeric>({Regex.Escape(CultureInfo.CurrentCulture.NumberFormat.PositiveSign)}|{Regex.Escape(CultureInfo.CurrentCulture.NumberFormat.NegativeSign)})?\d*{Regex.Escape(CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator)}?\d+)|
                (""(?<Text>([^""]|"""")*)"")|
                (?<NotLike>Not\s*Like)|
                (?<Between>Between)|
                (?<NotIn>Not\s*In)|
                (?<Empty>Empty)|
                (?<Boolean>{Regex.Escape(bool.FalseString)})|
                (?<Like>Like)|
                (?<Boolean>{Regex.Escape(bool.TrueString)})|
                (?<And>And)|
                (?<Not>Not)|
                (?<Boolean>Yes)|
                (?<Equal>EQ)|
                (?<Equal>==)|
                (?<GreaterThan>GT)|
                (?<GreaterThanOrEqual>\>=)|
                (?<GreaterThanOrEqual>GE)|
                (?<In>In)|
                (?<LessThan>LT)|
                (?<LessThanOrEqual>\<=)|
                (?<LessThanOrEqual>LE)|
                (?<NotEqual>\<\>)|
                (?<NotEqual>NE)|
                (?<NotEqual>!=)|
                (?<Boolean>No)|
                (?<Or>Or)|
                (?<Boolean>Y)|
                (?<Boolean>N)|
                (?<Boolean>{Regex.Escape(bool.TrueString.Substring(0,
1))})|
                (?<Boolean>{Regex.Escape(bool.FalseString.Substring(0,
1))})|
                (?<CloseParenthesis>\))|
                (?<Comma>,)|
                (?<Equal>=)|
                (?<GreaterThan>\>)|
                (?<LessThan>\<)|
                (?<OpenParenthesis>\())";


On Thu, 12 Dec 2019 at 20:06, Mike King <making1...@gmail.com> wrote:

> It was a very simple cutdown SQL (think just a where clause) and the regex
> was a multiline affair which picked out tokens using named groups. I then
> had a function that skipped whitespace tokens and returned the next token
> (group name) and the value (parsed and validated for dates and numeric
> values, strings were escaped).
>
> Secondly, my code is designed to run on many databases not just SQLite and
> I don’t have access at the level you describe from a .Net driver. So using
> something specific to one manufacturer is not a good idea nor is writing
> custom code for each database.
>
> All this was possible and I just added my experiences to show it is
> possible with a little thought.
>
> Cheers
>
>
>
> On Thu, 12 Dec 2019 at 19:29, Warren Young <war...@etr-usa.com> wrote:
>
>> On Dec 12, 2019, at 6:08 AM, Mike King <making1...@gmail.com> wrote:
>> >
>> > ...I decided on a simple subset of
>> > SQL and then wrote a parser using a regex as the tokeniser.
>>
>> First, [SQL is not a regular language][1], so it probably cannot be
>> completely parsed by regexes.  Not by a single regex without surrounding
>> logic, anyway.  There’s probably valid SQL that will pass your regex but
>> give unwanted behavior.
>>
>> Second, you’re reinventing SQLite’s own authorizer, which runs based on
>> the output of SQLite’s own well-tested SQL parsing engine.  (Which uses a
>> proper parser, not a regex.)
>>
>> Security is not an area where you want to roll your own logic.  Use
>> someone else’s well-tested solution whenever you can.
>>
>>
>> [1]: https://stackoverflow.com/a/33415289/142454
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to