"What I'm noticing with these projects is that they don't handle CQL files
properly"

--> your concern is very legit. But handling CQL files properly is very
complex, let me explain the reasons.

A naive solution if you want to handle CQL syntax is to re-use the ANTLR
grammar file here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/Cql.g

 I've gone down this path in the past and it's nearly impossible, simply
because the Cql.g grammar file is using a lot of "internal" Cassandra
classes. Just look at the import block at the beginning of the file.

At a higher level, we should clearly define the "scope" of a CQL script
executor. Is it responsible for 1) parsing CQL statements or 2) validating
CQL statements ?

As far as I'm concerned, point 2) should be done by Cassandra. If we limit
the scope of a script executor to point 1) it's sufficient.

Indeed the remaining challenge is : how to split a block of input text that
contains multiples CQL statements into a list of CQL statements that can be
executed sequentially (or in //) by the Java driver ?

The Zeppelin Cassandra interpreter is using Scala combinator parser to
define a minimum grammar to split differents CQL statements apart:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L179-L198

Until Cassandra 2.1, it's pretty easy, the semi-colon (;) can be used as
statement separator. Since Cassandra 2.2 and the introduction of UDF, it's
much more complex. Semi-colon can appears in Java source code block of the
definition of a function so using it as separator no longer works.

A complex regular expression like this:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L55-L69
is necessary to parse UDF creation statements correctly.

In a nutshell, parsing (and even not validating) CQL is harder than most
people think.



On Mon, Jan 11, 2016 at 10:52 PM, Richard L. Burton III <mrbur...@gmail.com>
wrote:

> What I'm noticing with these projects is that they don't handle CQL files
> properly. e.g., cassandra-unit dies when you have a string that contains ;
> inside of it. The parsing logic they use is very primitive in the sense
> they simple look for ; to denote the end of a statement.
>
> Is there any class in Cassandra I could use that given a *.cql file, it'll
> return a list of statements inside of it?
>
> Looking at CQLParser, it's only good for parsing a single statement vs. a
> file that contains multiple statements.
>
>
> On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Achilles 4.x does offer an embedded Cassandra server support with some
>> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>>
>> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>> Le 11 janv. 2016 20:47, "Richard L. Burton III" <mrbur...@gmail.com> a
>> écrit :
>>
>>> I'm looking to see what's recommended for an embedded version of
>>> Cassandra, just for unit testing.
>>>
>>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>>> wanted to see if there's was a better recommendation?
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>

Reply via email to