[
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571175#comment-17571175
]
Brad Schoening commented on CASSANDRA-17667:
--------------------------------------------
[~ahomoki], looking at this a little further, safescanner.py, which inherits
from the python re.Scanner, is tokenizing the input. An example from
[https://mail.python.org/pipermail/python-dev/2003-April/035075.html] shows
input is parsed:
{code:java}
import re
def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)
scanner = re.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))
{code}
In pylexotron this is implemented as:
{code:java}
RuleSpecScanner = SaferScanner([
(r'::=', lambda s, t: t),
(r'\[[a-z0-9_]+\]=', lambda s, t: ('named_collector', t[1:-2])),
(r'[a-z0-9_]+=', lambda s, t: ('named_symbol', t[:-1])),
(r'/(\[\^?.[^]]*\]|[^/]|\\.)*/', lambda s, t: ('regex',
t[1:-1].replace(r'\/', '/'))),
(r'"([^"]|\\.)*"', lambda s, t: ('litstring', t)),
(r'<[^>]*>', lambda s, t: ('reference', t[1:-1])),
(r'\bJUNK\b', lambda s, t: ('junk', t)),
(r'[@()|?*;]', lambda s, t: t),
(r'\s+', None),
(r'#[^\n]*', None),
], re.I | re.S | re.U) {code}
r'\s+' is skipping whitespace
I'm uncertain what r'#[^\n]*' and r'\bJUNK\b' are doing. Adding comments could
be helpful.
There doesn't seem to be a unit test class for pylexotron or SafeScanner,
however. That might be a good thing to add.
> Text value containing "/*" interpreted as multiline comment in cqlsh
> --------------------------------------------------------------------
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
> Issue Type: Bug
> Components: CQL/Interpreter
> Reporter: ANOOP THOMAS
> Assignee: Attila Homoki
> Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')'
> (...,'value1','/value2INSERT into tablename(INSERT into tablename
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL
> statement should not be interpreted as start of multi-line comment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]